c2v.tl.catboost#

c2v.tl.catboost(adata, obsm_key, gs_key=None, validation_key=None, validation_value=None, features=None, use_raw=None, layer=None, min_size=3, model='regressor', num_trees=10000, early_stopping_rounds=100, verbose=True, loss=None, eval_metric=None, response_transform=None, use_gpu=None, random_state=42, prediction_key_added='predicted', return_model=False, save_model=None, progress_bar=True, catboost_dir=None, mask_key=None, clr_pseudocount=0.001, **kwargs)#

Performs CatBoost regression or classification on the data aiming to identify associations between the features and the response variable.

Parameters:

adata sc.AnnData: Annotated data matrix.
obsm_key str: Key in adata.obsm to use for the response variable.
gs_key str | None, optional: Key in adata.obs with c2v.tl.gs() parameters. Default is None.
validation_key str | None, optional: Key in adata.obs to use for the train/validation split. If None, reconstructed from the adata.uns[gs_key]. Default is None.
validation_value str | None, optional: Value in adata.obs[validation_key] to use for the validation set. If None, reconstructed from the adata.uns[gs_key]. Default is None.
features list[str] | None, optional: List of additional adata.obs columns to use for the model (e.g. batch labels, layer names, and so on). Default is None.
use_raw bool | None, optional: Whether to use adata.raw for feature selection. Default is None.
layer str | None, optional: Layer in adata.layers to use for feature selection. Default is None.
model Literal["regressor", "classifier"], optional: Whether to perform regression or classification. Default is “regressor”.
num_trees int, optional: Number of trees to build. Default is 10000.
early_stopping_rounds int, optional: Number of iterations with no improvement on validation set after which training will be stopped. Default is 100.
verbose bool, optional: Whether to print verbose output. Default is True.
loss str | None, optional: Loss function to use. If None, set to “MultiRMSE” for multivariable regression, “RMSE” for univariable regression, “MultiCrossEntropy” for multivariable classification, and “CrossEntropy” for univariable classification. Default is None.
eval_metric str | None, optional: Evaluation metric to use. Default is None.
response_transform Literal["logit", "log1p", "sqrt", "clr"] | None, optional: Transform to apply to the response variable. Default is None.
use_gpu bool | None, optional: Whether to use GPU for training. Default is None.
random_state int | None, optional: Random seed for reproducibility. Default is 42.
prediction_key_added str, optional: Key in adata.obsm to add the predicted values. Default is “predicted”.
return_model bool, optional: Whether to return the trained model. Default is False.
save_model str | os.PathLike | None, optional: Path to save the trained model. Default is None.
progress_bar bool, optional: Whether to show a progress bar during training. Default is True.
catboost_dir os.PathLike | str | None, optional: Directory to save CatBoost training information. Default is None.
mask_key str | None | Literal[False], optional: Key in adata.obs or adata.obsm containing a boolean mask to filter cells, by default None.
clr_pseudocount float, optional: Pseudocount to add to expression values before CLR transformation, by default 1e-3.
**kwargs: Additional keyword arguments to pass to CatBoostRegressor or CatBoostClassifier.
min_size int

Return type:

sc.AnnData | tuple[sc.AnnData, CatBoostRegressor | CatBoostClassifier]

Returns:

adatasc.AnnData: Annotated data matrix with the predicted values added to adata.obsm[prediction_key_added].
modelCatBoostRegressor | CatBoostClassifier: Trained CatBoost model.

c2v.tl.catboost

Contents

c2v.tl.catboost#