c2v.tl.catboost#
- c2v.tl.catboost(adata, obsm_key, gs_key=None, validation_key=None, validation_value=None, features=None, use_raw=None, layer=None, min_size=3, model='regressor', num_trees=10000, early_stopping_rounds=100, verbose=True, loss=None, eval_metric=None, response_transform=None, use_gpu=None, random_state=42, prediction_key_added='predicted', return_model=False, save_model=None, progress_bar=True, catboost_dir=None, mask_key=None, clr_pseudocount=0.001, **kwargs)#
Performs CatBoost regression or classification on the data aiming to identify associations between the features and the response variable.
- Parameters:
- adata sc.AnnData
Annotated data matrix.
- obsm_key str
Key in adata.obsm to use for the response variable.
- gs_key str | None, optional
Key in adata.obs with c2v.tl.gs() parameters. Default is None.
- validation_key str | None, optional
Key in adata.obs to use for the train/validation split. If None, reconstructed from the adata.uns[gs_key]. Default is None.
- validation_value str | None, optional
Value in adata.obs[validation_key] to use for the validation set. If None, reconstructed from the adata.uns[gs_key]. Default is None.
- features list[str] | None, optional
List of additional adata.obs columns to use for the model (e.g. batch labels, layer names, and so on). Default is None.
- use_raw bool | None, optional
Whether to use adata.raw for feature selection. Default is None.
- layer str | None, optional
Layer in adata.layers to use for feature selection. Default is None.
- model Literal["regressor", "classifier"], optional
Whether to perform regression or classification. Default is “regressor”.
- num_trees int, optional
Number of trees to build. Default is 10000.
- early_stopping_rounds int, optional
Number of iterations with no improvement on validation set after which training will be stopped. Default is 100.
- verbose bool, optional
Whether to print verbose output. Default is True.
- loss str | None, optional
Loss function to use. If None, set to “MultiRMSE” for multivariable regression, “RMSE” for univariable regression, “MultiCrossEntropy” for multivariable classification, and “CrossEntropy” for univariable classification. Default is None.
- eval_metric str | None, optional
Evaluation metric to use. Default is None.
- response_transform Literal["logit", "log1p", "sqrt", "clr"] | None, optional
Transform to apply to the response variable. Default is None.
- use_gpu bool | None, optional
Whether to use GPU for training. Default is None.
- random_state int | None, optional
Random seed for reproducibility. Default is 42.
- prediction_key_added str, optional
Key in adata.obsm to add the predicted values. Default is “predicted”.
- return_model bool, optional
Whether to return the trained model. Default is False.
- save_model str | os.PathLike | None, optional
Path to save the trained model. Default is None.
- progress_bar bool, optional
Whether to show a progress bar during training. Default is True.
- catboost_dir os.PathLike | str | None, optional
Directory to save CatBoost training information. Default is None.
- mask_key str | None | Literal[False], optional
Key in adata.obs or adata.obsm containing a boolean mask to filter cells, by default None.
- clr_pseudocount float, optional
Pseudocount to add to expression values before CLR transformation, by default 1e-3.
- **kwargs
Additional keyword arguments to pass to CatBoostRegressor or CatBoostClassifier.
- min_size int
- Return type:
sc.AnnData | tuple[sc.AnnData, CatBoostRegressor | CatBoostClassifier]
- Returns:
- adatasc.AnnData
Annotated data matrix with the predicted values added to adata.obsm[prediction_key_added].
- modelCatBoostRegressor | CatBoostClassifier
Trained CatBoost model.