c2v.tl.catboost

Contents

c2v.tl.catboost#

c2v.tl.catboost(adata, obsm_key, gs_key=None, validation_key=None, validation_value=None, features=None, use_raw=None, layer=None, min_size=3, model='regressor', num_trees=10000, early_stopping_rounds=100, verbose=True, loss=None, eval_metric=None, response_transform=None, use_gpu=None, random_state=42, prediction_key_added='predicted', return_model=False, save_model=None, progress_bar=True, catboost_dir=None, mask_key=None, clr_pseudocount=0.001, **kwargs)#

Performs CatBoost regression or classification on the data aiming to identify associations between the features and the response variable.

Parameters:
adata sc.AnnData

Annotated data matrix.

obsm_key str

Key in adata.obsm to use for the response variable.

gs_key str | None, optional

Key in adata.obs with c2v.tl.gs() parameters. Default is None.

validation_key str | None, optional

Key in adata.obs to use for the train/validation split. If None, reconstructed from the adata.uns[gs_key]. Default is None.

validation_value str | None, optional

Value in adata.obs[validation_key] to use for the validation set. If None, reconstructed from the adata.uns[gs_key]. Default is None.

features list[str] | None, optional

List of additional adata.obs columns to use for the model (e.g. batch labels, layer names, and so on). Default is None.

use_raw bool | None, optional

Whether to use adata.raw for feature selection. Default is None.

layer str | None, optional

Layer in adata.layers to use for feature selection. Default is None.

model Literal["regressor", "classifier"], optional

Whether to perform regression or classification. Default is “regressor”.

num_trees int, optional

Number of trees to build. Default is 10000.

early_stopping_rounds int, optional

Number of iterations with no improvement on validation set after which training will be stopped. Default is 100.

verbose bool, optional

Whether to print verbose output. Default is True.

loss str | None, optional

Loss function to use. If None, set to “MultiRMSE” for multivariable regression, “RMSE” for univariable regression, “MultiCrossEntropy” for multivariable classification, and “CrossEntropy” for univariable classification. Default is None.

eval_metric str | None, optional

Evaluation metric to use. Default is None.

response_transform Literal["logit", "log1p", "sqrt", "clr"] | None, optional

Transform to apply to the response variable. Default is None.

use_gpu bool | None, optional

Whether to use GPU for training. Default is None.

random_state int | None, optional

Random seed for reproducibility. Default is 42.

prediction_key_added str, optional

Key in adata.obsm to add the predicted values. Default is “predicted”.

return_model bool, optional

Whether to return the trained model. Default is False.

save_model str | os.PathLike | None, optional

Path to save the trained model. Default is None.

progress_bar bool, optional

Whether to show a progress bar during training. Default is True.

catboost_dir os.PathLike | str | None, optional

Directory to save CatBoost training information. Default is None.

mask_key str | None | Literal[False], optional

Key in adata.obs or adata.obsm containing a boolean mask to filter cells, by default None.

clr_pseudocount float, optional

Pseudocount to add to expression values before CLR transformation, by default 1e-3.

**kwargs

Additional keyword arguments to pass to CatBoostRegressor or CatBoostClassifier.

min_size int

Return type:

sc.AnnData | tuple[sc.AnnData, CatBoostRegressor | CatBoostClassifier]

Returns:

adatasc.AnnData

Annotated data matrix with the predicted values added to adata.obsm[prediction_key_added].

modelCatBoostRegressor | CatBoostClassifier

Trained CatBoost model.