celldisect.CellDISECT.setup_anndata

classmethod CellDISECT.setup_anndata(adata: anndata.AnnData, layer: str | None = None, batch_key: str | None = None, labels_key: str | None = None, size_factor_key: str | None = None, categorical_covariate_keys: List[str] | None = None, continuous_covariate_keys: List[str] | None = None, add_cluster_covariate: bool = False, clustering_normalize_counts: bool = True, perturbation_key: str | None = None, perturbation_embedding_key: str | None = None, perturbation_combination_delimiter: str = '+', **kwargs)

Set up the AnnData object for the CellDISECT model.

This method configures the AnnData object by registering the necessary fields and optionally adding a cluster covariate. When perturbation_key is provided, the corresponding column in adata.obs is treated as a perturbation covariate whose embeddings come from adata.uns[perturbation_embedding_key] rather than being learned during training.

Parameters:

adata (AnnData) – AnnData object to be set up.
layer (Optional[str], optional) – Layer in adata to use as the count data, by default None.
batch_key (Optional[str], optional) – Key in adata.obs for batch information, by default None.
labels_key (Optional[str], optional) – Key in adata.obs for labels, by default None.
size_factor_key (Optional[str], optional) – Key in adata.obs for size factors, by default None.
categorical_covariate_keys (Optional[List[str]], optional) – List of keys in adata.obs for categorical covariates, by default None.
continuous_covariate_keys (Optional[List[str]], optional) – List of keys in adata.obs for continuous covariates, by default None.
add_cluster_covariate (bool, optional) – Whether to add a cluster covariate to adata.obs, by default False.
clustering_normalize_counts (bool, optional) – Whether to normalize counts before clustering, by default True.
perturbation_key (Optional[str], optional) – Column in adata.obs that contains perturbation labels (e.g. "GeneA", "GeneA+GeneB"). When set, the perturbation covariate uses predefined embeddings instead of learned ones.
perturbation_embedding_key (Optional[str], optional) – Key in adata.uns whose value is a dict[str, np.ndarray] mapping atomic perturbation names to their vector representations (e.g. ESM or GenePT embeddings). Required when perturbation_key is set.
perturbation_combination_delimiter (str, optional) – Delimiter for combinatorial perturbation labels, by default "+".
**kwargs – Additional keyword arguments.

Return type:

None