1. CellDISECT Counterfactual Analysis

This tutorial demonstrates how to use CellDISECT for counterfactual analysis in single-cell RNA sequencing data. CellDISECT (Cell DISentangled Experts for Covariate counTerfactuals) is a causal generative model that disentangles variations in single-cell data and enables counterfactual predictions.

In this tutorial, we will:

  1. Install and import the necessary packages

  2. Load and preprocess a dataset

  3. Train a CellDISECT model

  4. Extract disentangled latent representations

  5. Generate and evaluate counterfactual predictions

1.1. Step 1: Installation and Setup

First, we install the CellDISECT package and its dependencies. We’re using the beta version (0.2.0b1) which is compatible with Google Colab and newer versions of PyTorch and scvi-tools.

[2]:
!pip install celldisect==0.2.0b1
!pip install torchvision==0.16.2
Collecting celldisect==0.2.0b1
  Downloading celldisect-0.2.0b1-py3-none-any.whl.metadata (7.5 kB)
Requirement already satisfied: adjustText in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (1.3.0)
Requirement already satisfied: anndata<0.10.9,>=0.10.8 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (0.10.8)
Requirement already satisfied: black in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (25.1.0)
Requirement already satisfied: flake8 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (7.1.2)
Requirement already satisfied: gdown in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (5.2.0)
Requirement already satisfied: importlib-metadata in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (8.6.1)
Requirement already satisfied: ipykernel in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (6.29.5)
Requirement already satisfied: jax<0.4.24,>=0.4.16 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (0.4.23)
Requirement already satisfied: jaxlib<0.4.24,>=0.4.16 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (0.4.23)
Requirement already satisfied: jupyter in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (1.1.1)
Requirement already satisfied: lightning<2.3.0,>=2.2.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (2.2.5)
Requirement already satisfied: llvmlite<0.43,>=0.42 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (0.42.0)
Requirement already satisfied: nbconvert in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (7.16.6)
Requirement already satisfied: nbformat in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (5.10.4)
Requirement already satisfied: numpy<1.27.0,>=1.26.3 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (1.26.4)
Requirement already satisfied: pytest in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (8.3.4)
Requirement already satisfied: pytest-cov in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (6.0.0)
Requirement already satisfied: ray<2.44.0,>=2.9.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (2.9.3)
Requirement already satisfied: scib<1.2.0,>=1.1.5 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (1.1.7)
Requirement already satisfied: scib-metrics<0.6.0,>=0.5.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (0.5.1)
Requirement already satisfied: scipy<1.13.0,>=1.12.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (1.12.0)
Collecting scvi-tools<=1.3.0,>=1.0.0 (from celldisect==0.2.0b1)
  Downloading scvi_tools-1.1.6.post2-py3-none-any.whl.metadata (18 kB)
Requirement already satisfied: seaborn in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (0.13.2)
Requirement already satisfied: torch<2.3.0,>=2.1.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (2.1.2)
Requirement already satisfied: typing_extensions<4.6,>=4.5 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (4.5.0)
Requirement already satisfied: wandb in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from celldisect==0.2.0b1) (0.18.7)
Requirement already satisfied: array-api-compat!=1.5,>1.4 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from anndata<0.10.9,>=0.10.8->celldisect==0.2.0b1) (1.10.0)
Requirement already satisfied: exceptiongroup in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from anndata<0.10.9,>=0.10.8->celldisect==0.2.0b1) (1.2.2)
Requirement already satisfied: h5py>=3.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from anndata<0.10.9,>=0.10.8->celldisect==0.2.0b1) (3.13.0)
Requirement already satisfied: natsort in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from anndata<0.10.9,>=0.10.8->celldisect==0.2.0b1) (8.4.0)
Requirement already satisfied: packaging>=20.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from anndata<0.10.9,>=0.10.8->celldisect==0.2.0b1) (24.2)
Requirement already satisfied: pandas!=2.1.0rc0,!=2.1.2,>=1.4 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from anndata<0.10.9,>=0.10.8->celldisect==0.2.0b1) (2.2.3)
Requirement already satisfied: ml-dtypes>=0.2.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jax<0.4.24,>=0.4.16->celldisect==0.2.0b1) (0.5.1)
Requirement already satisfied: opt-einsum in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jax<0.4.24,>=0.4.16->celldisect==0.2.0b1) (3.4.0)
Requirement already satisfied: zipp>=3.20 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from importlib-metadata->celldisect==0.2.0b1) (3.21.0)
Requirement already satisfied: PyYAML<8.0,>=5.4 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from lightning<2.3.0,>=2.2.0->celldisect==0.2.0b1) (6.0.2)
Requirement already satisfied: fsspec<2025.0,>=2022.5.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from fsspec[http]<2025.0,>=2022.5.0->lightning<2.3.0,>=2.2.0->celldisect==0.2.0b1) (2024.12.0)
Requirement already satisfied: lightning-utilities<2.0,>=0.8.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from lightning<2.3.0,>=2.2.0->celldisect==0.2.0b1) (0.12.0)
Requirement already satisfied: torchmetrics<3.0,>=0.7.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from lightning<2.3.0,>=2.2.0->celldisect==0.2.0b1) (1.6.1)
Requirement already satisfied: tqdm<6.0,>=4.57.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from lightning<2.3.0,>=2.2.0->celldisect==0.2.0b1) (4.67.1)
Requirement already satisfied: pytorch-lightning in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from lightning<2.3.0,>=2.2.0->celldisect==0.2.0b1) (1.9.5)
Requirement already satisfied: click>=7.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray<2.44.0,>=2.9.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (8.1.8)
Requirement already satisfied: filelock in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray<2.44.0,>=2.9.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (3.17.0)
Requirement already satisfied: jsonschema in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray<2.44.0,>=2.9.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (4.23.0)
Requirement already satisfied: msgpack<2.0.0,>=1.0.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray<2.44.0,>=2.9.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (1.1.0)
Requirement already satisfied: protobuf!=3.19.5,>=3.15.3 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray<2.44.0,>=2.9.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (5.29.3)
Requirement already satisfied: aiosignal in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray<2.44.0,>=2.9.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (1.3.2)
Requirement already satisfied: frozenlist in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray<2.44.0,>=2.9.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (1.5.0)
Requirement already satisfied: requests in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray<2.44.0,>=2.9.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (2.32.3)
Requirement already satisfied: tensorboardX>=1.9 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (2.6.2.2)
Requirement already satisfied: pyarrow>=6.0.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (19.0.0)
Requirement already satisfied: fastapi<=0.108.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.103.2)
Requirement already satisfied: watchfiles in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (1.0.4)
Requirement already satisfied: aiohttp-cors in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.7.0)
Requirement already satisfied: smart-open in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (7.1.0)
Requirement already satisfied: py-spy>=0.2.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.4.0)
Requirement already satisfied: pydantic!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,<3 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (1.10.21)
Requirement already satisfied: uvicorn[standard] in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.34.0)
Requirement already satisfied: gpustat>=1.0.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (1.1.1)
Requirement already satisfied: prometheus-client>=0.7.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.21.1)
Requirement already satisfied: colorful in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.5.6)
Requirement already satisfied: virtualenv!=20.21.1,>=20.0.24 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (20.29.2)
Requirement already satisfied: starlette in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.27.0)
Requirement already satisfied: opencensus in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.11.4)
Requirement already satisfied: aiorwlock in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (1.5.0)
Requirement already satisfied: aiohttp>=3.7 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (3.11.12)
Requirement already satisfied: grpcio>=1.32.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (1.70.0)
Requirement already satisfied: matplotlib in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (3.9.4)
Requirement already satisfied: numba in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (0.59.1)
Requirement already satisfied: scanpy>=1.5 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (1.10.3)
Requirement already satisfied: scikit-learn in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (1.6.1)
Requirement already satisfied: scikit-misc in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (0.3.1)
Requirement already satisfied: leidenalg in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (0.10.2)
Requirement already satisfied: umap-learn in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (0.5.7)
Requirement already satisfied: pydot in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (3.0.4)
Requirement already satisfied: igraph>=0.10 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (0.11.8)
Requirement already satisfied: deprecated in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (1.2.18)
Requirement already satisfied: chex in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scib-metrics<0.6.0,>=0.5.1->celldisect==0.2.0b1) (0.1.86)
Requirement already satisfied: plottable in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scib-metrics<0.6.0,>=0.5.1->celldisect==0.2.0b1) (0.1.5)
Requirement already satisfied: pynndescent in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scib-metrics<0.6.0,>=0.5.1->celldisect==0.2.0b1) (0.5.13)
Requirement already satisfied: rich in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scib-metrics<0.6.0,>=0.5.1->celldisect==0.2.0b1) (13.9.4)
Requirement already satisfied: docrep>=0.3.2 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scvi-tools<=1.3.0,>=1.0.0->celldisect==0.2.0b1) (0.3.2)
Requirement already satisfied: flax in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scvi-tools<=1.3.0,>=1.0.0->celldisect==0.2.0b1) (0.8.4)
Requirement already satisfied: ml-collections>=0.1.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scvi-tools<=1.3.0,>=1.0.0->celldisect==0.2.0b1) (0.1.1)
Requirement already satisfied: mudata>=0.1.2 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scvi-tools<=1.3.0,>=1.0.0->celldisect==0.2.0b1) (0.2.4)
Requirement already satisfied: numpyro>=0.12.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scvi-tools<=1.3.0,>=1.0.0->celldisect==0.2.0b1) (0.15.0)
Requirement already satisfied: optax in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scvi-tools<=1.3.0,>=1.0.0->celldisect==0.2.0b1) (0.2.2)
Requirement already satisfied: pyro-ppl>=1.6.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scvi-tools<=1.3.0,>=1.0.0->celldisect==0.2.0b1) (1.9.1)
Requirement already satisfied: sympy in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from torch<2.3.0,>=2.1.0->celldisect==0.2.0b1) (1.13.1)
Requirement already satisfied: networkx in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from torch<2.3.0,>=2.1.0->celldisect==0.2.0b1) (3.2.1)
Requirement already satisfied: jinja2 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from torch<2.3.0,>=2.1.0->celldisect==0.2.0b1) (3.1.5)
Requirement already satisfied: mypy-extensions>=0.4.3 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from black->celldisect==0.2.0b1) (1.0.0)
Requirement already satisfied: pathspec>=0.9.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from black->celldisect==0.2.0b1) (0.12.1)
Requirement already satisfied: platformdirs>=2 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from black->celldisect==0.2.0b1) (4.3.6)
Requirement already satisfied: tomli>=1.1.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from black->celldisect==0.2.0b1) (2.2.1)
Requirement already satisfied: mccabe<0.8.0,>=0.7.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from flake8->celldisect==0.2.0b1) (0.7.0)
Requirement already satisfied: pycodestyle<2.13.0,>=2.12.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from flake8->celldisect==0.2.0b1) (2.12.1)
Requirement already satisfied: pyflakes<3.3.0,>=3.2.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from flake8->celldisect==0.2.0b1) (3.2.0)
Requirement already satisfied: beautifulsoup4 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from gdown->celldisect==0.2.0b1) (4.13.3)
Requirement already satisfied: appnope in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipykernel->celldisect==0.2.0b1) (0.1.4)
Requirement already satisfied: comm>=0.1.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipykernel->celldisect==0.2.0b1) (0.2.2)
Requirement already satisfied: debugpy>=1.6.5 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipykernel->celldisect==0.2.0b1) (1.8.12)
Requirement already satisfied: ipython>=7.23.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipykernel->celldisect==0.2.0b1) (8.18.1)
Requirement already satisfied: jupyter-client>=6.1.12 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipykernel->celldisect==0.2.0b1) (8.6.3)
Requirement already satisfied: jupyter-core!=5.0.*,>=4.12 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipykernel->celldisect==0.2.0b1) (5.7.2)
Requirement already satisfied: matplotlib-inline>=0.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipykernel->celldisect==0.2.0b1) (0.1.7)
Requirement already satisfied: nest-asyncio in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipykernel->celldisect==0.2.0b1) (1.6.0)
Requirement already satisfied: psutil in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipykernel->celldisect==0.2.0b1) (7.0.0)
Requirement already satisfied: pyzmq>=24 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipykernel->celldisect==0.2.0b1) (26.2.1)
Requirement already satisfied: tornado>=6.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipykernel->celldisect==0.2.0b1) (6.4.2)
Requirement already satisfied: traitlets>=5.4.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipykernel->celldisect==0.2.0b1) (5.14.3)
Requirement already satisfied: notebook in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyter->celldisect==0.2.0b1) (7.3.2)
Requirement already satisfied: jupyter-console in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyter->celldisect==0.2.0b1) (6.6.3)
Requirement already satisfied: ipywidgets in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyter->celldisect==0.2.0b1) (8.1.5)
Requirement already satisfied: jupyterlab in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyter->celldisect==0.2.0b1) (4.3.5)
Requirement already satisfied: bleach!=5.0.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from bleach[css]!=5.0.0->nbconvert->celldisect==0.2.0b1) (6.2.0)
Requirement already satisfied: defusedxml in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from nbconvert->celldisect==0.2.0b1) (0.7.1)
Requirement already satisfied: jupyterlab-pygments in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from nbconvert->celldisect==0.2.0b1) (0.3.0)
Requirement already satisfied: markupsafe>=2.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from nbconvert->celldisect==0.2.0b1) (3.0.2)
Requirement already satisfied: mistune<4,>=2.0.3 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from nbconvert->celldisect==0.2.0b1) (3.1.1)
Requirement already satisfied: nbclient>=0.5.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from nbconvert->celldisect==0.2.0b1) (0.10.2)
Requirement already satisfied: pandocfilters>=1.4.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from nbconvert->celldisect==0.2.0b1) (1.5.1)
Requirement already satisfied: pygments>=2.4.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from nbconvert->celldisect==0.2.0b1) (2.19.1)
Requirement already satisfied: fastjsonschema>=2.15 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from nbformat->celldisect==0.2.0b1) (2.21.1)
Requirement already satisfied: iniconfig in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from pytest->celldisect==0.2.0b1) (2.0.0)
Requirement already satisfied: pluggy<2,>=1.5 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from pytest->celldisect==0.2.0b1) (1.5.0)
Requirement already satisfied: coverage>=7.5 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from coverage[toml]>=7.5->pytest-cov->celldisect==0.2.0b1) (7.6.12)
Requirement already satisfied: docker-pycreds>=0.4.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from wandb->celldisect==0.2.0b1) (0.4.0)
Requirement already satisfied: gitpython!=3.1.29,>=1.0.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from wandb->celldisect==0.2.0b1) (3.1.44)
Requirement already satisfied: sentry-sdk>=2.0.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from wandb->celldisect==0.2.0b1) (2.22.0)
Requirement already satisfied: setproctitle in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from wandb->celldisect==0.2.0b1) (1.3.4)
Requirement already satisfied: setuptools in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from wandb->celldisect==0.2.0b1) (69.5.1)
Requirement already satisfied: aiohappyeyeballs>=2.3.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from aiohttp>=3.7->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (2.4.6)
Requirement already satisfied: async-timeout<6.0,>=4.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from aiohttp>=3.7->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (5.0.1)
Requirement already satisfied: attrs>=17.3.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from aiohttp>=3.7->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (25.1.0)
Requirement already satisfied: multidict<7.0,>=4.5 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from aiohttp>=3.7->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (6.1.0)
Requirement already satisfied: propcache>=0.2.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from aiohttp>=3.7->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.2.1)
Requirement already satisfied: yarl<2.0,>=1.17.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from aiohttp>=3.7->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (1.18.3)
Requirement already satisfied: webencodings in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from bleach!=5.0.0->bleach[css]!=5.0.0->nbconvert->celldisect==0.2.0b1) (0.5.1)
Requirement already satisfied: tinycss2<1.5,>=1.1.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from bleach[css]!=5.0.0->nbconvert->celldisect==0.2.0b1) (1.4.0)
Requirement already satisfied: six>=1.4.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from docker-pycreds>=0.4.0->wandb->celldisect==0.2.0b1) (1.17.0)
Requirement already satisfied: anyio<4.0.0,>=3.7.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from fastapi<=0.108.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (3.7.1)
Requirement already satisfied: gitdb<5,>=4.0.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from gitpython!=3.1.29,>=1.0.0->wandb->celldisect==0.2.0b1) (4.0.12)
Requirement already satisfied: nvidia-ml-py>=11.450.129 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from gpustat>=1.0.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (12.570.86)
Requirement already satisfied: blessed>=1.17.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from gpustat>=1.0.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (1.20.0)
Requirement already satisfied: texttable>=1.6.2 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from igraph>=0.10->scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (1.7.0)
Requirement already satisfied: decorator in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->celldisect==0.2.0b1) (5.1.1)
Requirement already satisfied: jedi>=0.16 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->celldisect==0.2.0b1) (0.19.2)
Requirement already satisfied: prompt-toolkit<3.1.0,>=3.0.41 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->celldisect==0.2.0b1) (3.0.50)
Requirement already satisfied: stack-data in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->celldisect==0.2.0b1) (0.6.3)
Requirement already satisfied: pexpect>4.3 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipython>=7.23.1->ipykernel->celldisect==0.2.0b1) (4.9.0)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jsonschema->ray<2.44.0,>=2.9.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (2024.10.1)
Requirement already satisfied: referencing>=0.28.4 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jsonschema->ray<2.44.0,>=2.9.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.36.2)
Requirement already satisfied: rpds-py>=0.7.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jsonschema->ray<2.44.0,>=2.9.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.22.3)
Requirement already satisfied: python-dateutil>=2.8.2 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyter-client>=6.1.12->ipykernel->celldisect==0.2.0b1) (2.9.0.post0)
Requirement already satisfied: contourpy>=1.0.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from matplotlib->scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (1.3.0)
Requirement already satisfied: cycler>=0.10 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from matplotlib->scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from matplotlib->scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (4.56.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from matplotlib->scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (1.4.7)
Requirement already satisfied: pillow>=8 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from matplotlib->scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (11.1.0)
Requirement already satisfied: pyparsing>=2.3.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from matplotlib->scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (3.2.1)
Requirement already satisfied: importlib-resources>=3.2.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from matplotlib->scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (6.5.2)
Requirement already satisfied: absl-py in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ml-collections>=0.1.1->scvi-tools<=1.3.0,>=1.0.0->celldisect==0.2.0b1) (2.1.0)
Requirement already satisfied: contextlib2 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ml-collections>=0.1.1->scvi-tools<=1.3.0,>=1.0.0->celldisect==0.2.0b1) (21.6.0)
Requirement already satisfied: multipledispatch in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from numpyro>=0.12.1->scvi-tools<=1.3.0,>=1.0.0->celldisect==0.2.0b1) (1.0.0)
Requirement already satisfied: pytz>=2020.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from pandas!=2.1.0rc0,!=2.1.2,>=1.4->anndata<0.10.9,>=0.10.8->celldisect==0.2.0b1) (2025.1)
Requirement already satisfied: tzdata>=2022.7 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from pandas!=2.1.0rc0,!=2.1.2,>=1.4->anndata<0.10.9,>=0.10.8->celldisect==0.2.0b1) (2025.1)
Requirement already satisfied: pyro-api>=0.1.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from pyro-ppl>=1.6.0->scvi-tools<=1.3.0,>=1.0.0->celldisect==0.2.0b1) (0.1.2)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from requests->ray<2.44.0,>=2.9.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from requests->ray<2.44.0,>=2.9.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from requests->ray<2.44.0,>=2.9.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from requests->ray<2.44.0,>=2.9.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (2025.1.31)
Requirement already satisfied: markdown-it-py>=2.2.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from rich->scib-metrics<0.6.0,>=0.5.1->celldisect==0.2.0b1) (3.0.0)
Requirement already satisfied: get-annotations in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scanpy>=1.5->scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (0.1.2)
Requirement already satisfied: joblib in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scanpy>=1.5->scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (1.4.2)
Requirement already satisfied: legacy-api-wrap>=1.4 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scanpy>=1.5->scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (1.4.1)
Requirement already satisfied: patsy in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scanpy>=1.5->scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (1.0.1)
Requirement already satisfied: session-info in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scanpy>=1.5->scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (1.0.0)
Requirement already satisfied: statsmodels>=0.13 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scanpy>=1.5->scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (0.14.4)
Requirement already satisfied: threadpoolctl>=3.1.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from scikit-learn->scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (3.5.0)
Requirement already satisfied: distlib<1,>=0.3.7 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from virtualenv!=20.21.1,>=20.0.24->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.3.9)
Requirement already satisfied: soupsieve>1.2 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from beautifulsoup4->gdown->celldisect==0.2.0b1) (2.6)
Requirement already satisfied: toolz>=0.9.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from chex->scib-metrics<0.6.0,>=0.5.1->celldisect==0.2.0b1) (1.0.0)
Requirement already satisfied: wrapt<2,>=1.10 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from deprecated->scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (1.17.2)
Requirement already satisfied: orbax-checkpoint in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from flax->scvi-tools<=1.3.0,>=1.0.0->celldisect==0.2.0b1) (0.5.16)
Requirement already satisfied: tensorstore in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from flax->scvi-tools<=1.3.0,>=1.0.0->celldisect==0.2.0b1) (0.1.69)
Requirement already satisfied: widgetsnbextension~=4.0.12 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipywidgets->jupyter->celldisect==0.2.0b1) (4.0.13)
Requirement already satisfied: jupyterlab-widgets~=3.0.12 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from ipywidgets->jupyter->celldisect==0.2.0b1) (3.0.13)
Requirement already satisfied: async-lru>=1.0.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyterlab->jupyter->celldisect==0.2.0b1) (2.0.4)
Requirement already satisfied: httpx>=0.25.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyterlab->jupyter->celldisect==0.2.0b1) (0.28.1)
Requirement already satisfied: jupyter-lsp>=2.0.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyterlab->jupyter->celldisect==0.2.0b1) (2.2.5)
Requirement already satisfied: jupyter-server<3,>=2.4.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyterlab->jupyter->celldisect==0.2.0b1) (2.15.0)
Requirement already satisfied: jupyterlab-server<3,>=2.27.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyterlab->jupyter->celldisect==0.2.0b1) (2.27.3)
Requirement already satisfied: notebook-shim>=0.2 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyterlab->jupyter->celldisect==0.2.0b1) (0.2.4)
Requirement already satisfied: opencensus-context>=0.1.3 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from opencensus->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.1.3)
Requirement already satisfied: google-api-core<3.0.0,>=1.0.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from opencensus->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (2.24.1)
Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from requests[socks]->gdown->celldisect==0.2.0b1) (1.7.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from sympy->torch<2.3.0,>=2.1.0->celldisect==0.2.0b1) (1.3.0)
Requirement already satisfied: h11>=0.8 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from uvicorn[standard]; extra == "serve"->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.14.0)
Requirement already satisfied: httptools>=0.6.3 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from uvicorn[standard]; extra == "serve"->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.6.4)
Requirement already satisfied: python-dotenv>=0.13 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from uvicorn[standard]; extra == "serve"->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (1.0.1)
Requirement already satisfied: uvloop!=0.15.0,!=0.15.1,>=0.14.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from uvicorn[standard]; extra == "serve"->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.21.0)
Requirement already satisfied: websockets>=10.4 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from uvicorn[standard]; extra == "serve"->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (15.0)
Requirement already satisfied: sniffio>=1.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from anyio<4.0.0,>=3.7.1->fastapi<=0.108.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (1.3.1)
Requirement already satisfied: wcwidth>=0.1.4 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from blessed>=1.17.1->gpustat>=1.0.0->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.2.13)
Requirement already satisfied: smmap<6,>=3.0.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from gitdb<5,>=4.0.1->gitpython!=3.1.29,>=1.0.0->wandb->celldisect==0.2.0b1) (5.0.2)
Requirement already satisfied: googleapis-common-protos<2.0.dev0,>=1.56.2 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (1.67.0)
Requirement already satisfied: proto-plus<2.0.0dev,>=1.22.3 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (1.26.0)
Requirement already satisfied: google-auth<3.0.dev0,>=2.14.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from google-api-core<3.0.0,>=1.0.0->opencensus->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (2.38.0)
Requirement already satisfied: httpcore==1.* in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from httpx>=0.25.0->jupyterlab->jupyter->celldisect==0.2.0b1) (1.0.7)
Requirement already satisfied: parso<0.9.0,>=0.8.4 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jedi>=0.16->ipython>=7.23.1->ipykernel->celldisect==0.2.0b1) (0.8.4)
Requirement already satisfied: argon2-cffi>=21.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (23.1.0)
Requirement already satisfied: jupyter-events>=0.11.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (0.12.0)
Requirement already satisfied: jupyter-server-terminals>=0.4.4 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (0.5.3)
Requirement already satisfied: overrides>=5.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (7.7.0)
Requirement already satisfied: send2trash>=1.8.2 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (1.8.3)
Requirement already satisfied: terminado>=0.8.3 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (0.18.1)
Requirement already satisfied: websocket-client>=1.7 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (1.8.0)
Requirement already satisfied: babel>=2.10 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter->celldisect==0.2.0b1) (2.17.0)
Requirement already satisfied: json5>=0.9.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyterlab-server<3,>=2.27.1->jupyterlab->jupyter->celldisect==0.2.0b1) (0.10.0)
Requirement already satisfied: mdurl~=0.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from markdown-it-py>=2.2.0->rich->scib-metrics<0.6.0,>=0.5.1->celldisect==0.2.0b1) (0.1.2)
Requirement already satisfied: ptyprocess>=0.5 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from pexpect>4.3->ipython>=7.23.1->ipykernel->celldisect==0.2.0b1) (0.7.0)
Requirement already satisfied: etils[epath,epy] in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from orbax-checkpoint->flax->scvi-tools<=1.3.0,>=1.0.0->celldisect==0.2.0b1) (1.5.2)
Requirement already satisfied: stdlib_list in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from session-info->scanpy>=1.5->scib<1.2.0,>=1.1.5->celldisect==0.2.0b1) (0.11.1)
Requirement already satisfied: executing>=1.2.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from stack-data->ipython>=7.23.1->ipykernel->celldisect==0.2.0b1) (2.2.0)
Requirement already satisfied: asttokens>=2.1.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from stack-data->ipython>=7.23.1->ipykernel->celldisect==0.2.0b1) (3.0.0)
Requirement already satisfied: pure-eval in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from stack-data->ipython>=7.23.1->ipykernel->celldisect==0.2.0b1) (0.2.3)
Requirement already satisfied: argon2-cffi-bindings in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from argon2-cffi>=21.1->jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (21.2.0)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from google-auth<3.0.dev0,>=2.14.1->google-api-core<3.0.0,>=1.0.0->opencensus->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (5.5.1)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from google-auth<3.0.dev0,>=2.14.1->google-api-core<3.0.0,>=1.0.0->opencensus->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.4.1)
Requirement already satisfied: rsa<5,>=3.1.4 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from google-auth<3.0.dev0,>=2.14.1->google-api-core<3.0.0,>=1.0.0->opencensus->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (4.9)
Requirement already satisfied: python-json-logger>=2.0.4 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (3.2.1)
Requirement already satisfied: rfc3339-validator in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (0.1.4)
Requirement already satisfied: rfc3986-validator>=0.1.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (0.1.1)
Requirement already satisfied: fqdn in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (1.5.1)
Requirement already satisfied: isoduration in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (20.11.0)
Requirement already satisfied: jsonpointer>1.13 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (3.0.0)
Requirement already satisfied: uri-template in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (1.3.0)
Requirement already satisfied: webcolors>=24.6.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (24.11.1)
Requirement already satisfied: pyasn1<0.7.0,>=0.4.6 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from pyasn1-modules>=0.2.1->google-auth<3.0.dev0,>=2.14.1->google-api-core<3.0.0,>=1.0.0->opencensus->ray[data,serve,train,tune]<2.44.0,>=2.9.0->celldisect==0.2.0b1) (0.6.1)
Requirement already satisfied: cffi>=1.0.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from argon2-cffi-bindings->argon2-cffi>=21.1->jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (1.17.1)
Requirement already satisfied: pycparser in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from cffi>=1.0.1->argon2-cffi-bindings->argon2-cffi>=21.1->jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (2.22)
Requirement already satisfied: arrow>=0.15.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from isoduration->jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (1.3.0)
Requirement already satisfied: types-python-dateutil>=2.8.10 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from arrow>=0.15.0->isoduration->jsonschema[format-nongpl]>=4.18.0->jupyter-events>=0.11.0->jupyter-server<3,>=2.4.0->jupyterlab->jupyter->celldisect==0.2.0b1) (2.9.0.20241206)
Downloading celldisect-0.2.0b1-py3-none-any.whl (35 kB)
Downloading scvi_tools-1.1.6.post2-py3-none-any.whl (387 kB)
^C
ERROR: Operation cancelled by user
Collecting torchvision==0.16.2
  Downloading torchvision-0.16.2-cp39-cp39-macosx_11_0_arm64.whl.metadata (6.6 kB)
Requirement already satisfied: numpy in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from torchvision==0.16.2) (1.26.4)
Requirement already satisfied: requests in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from torchvision==0.16.2) (2.32.3)
Requirement already satisfied: torch==2.1.2 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from torchvision==0.16.2) (2.1.2)
Requirement already satisfied: pillow!=8.3.*,>=5.3.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from torchvision==0.16.2) (11.1.0)
Requirement already satisfied: filelock in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from torch==2.1.2->torchvision==0.16.2) (3.17.0)
Requirement already satisfied: typing-extensions in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from torch==2.1.2->torchvision==0.16.2) (4.5.0)
Requirement already satisfied: sympy in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from torch==2.1.2->torchvision==0.16.2) (1.13.1)
Requirement already satisfied: networkx in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from torch==2.1.2->torchvision==0.16.2) (3.2.1)
Requirement already satisfied: jinja2 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from torch==2.1.2->torchvision==0.16.2) (3.1.5)
Requirement already satisfied: fsspec in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from torch==2.1.2->torchvision==0.16.2) (2024.12.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from requests->torchvision==0.16.2) (3.4.1)
Requirement already satisfied: idna<4,>=2.5 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from requests->torchvision==0.16.2) (3.10)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from requests->torchvision==0.16.2) (2.3.0)
Requirement already satisfied: certifi>=2017.4.17 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from requests->torchvision==0.16.2) (2025.1.31)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from jinja2->torch==2.1.2->torchvision==0.16.2) (3.0.2)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/anaconda3/envs/disect/lib/python3.9/site-packages (from sympy->torch==2.1.2->torchvision==0.16.2) (1.3.0)
Downloading torchvision-0.16.2-cp39-cp39-macosx_11_0_arm64.whl (1.5 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 9.0 MB/s eta 0:00:00
Installing collected packages: torchvision
  Attempting uninstall: torchvision
    Found existing installation: torchvision 0.21.0
    Uninstalling torchvision-0.21.0:
      Successfully uninstalled torchvision-0.21.0
Successfully installed torchvision-0.16.2

If you’re running this notebook in Google Colab, you’ll need to mount your Google Drive to save models and results.

[2]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive

Now we import the necessary libraries. We’ll use:

  • celldisect: Our main package for counterfactual analysis

  • scvi: The underlying framework for single-cell probabilistic modeling

  • scanpy: For single-cell data analysis and visualization

  • anndata: For handling annotated data matrices

  • Other utility libraries for file operations, garbage collection, and statistical analysis

[2]:
%load_ext autoreload
%autoreload 2

from celldisect import CellDISECT
import scvi
scvi.settings.seed = 42 # Setting a random seed for reproducibility
import scanpy as sc
from anndata import AnnData
import os
import shutil
import gc
import anndata as ad
import pandas as pd
from scipy.stats import pearsonr
import numpy as np

INFO: Seed set to 42
INFO:lightning.fabric.utilities.seed:Seed set to 42

1.2. Step 2: Data Loading and Exploration

We’ll use the Kang dataset, which contains PBMCs (Peripheral Blood Mononuclear Cells) from lupus patients, with cells either in control condition or stimulated with interferon-beta. This dataset is ideal for counterfactual analysis as we can predict how cells would respond to stimulation.

[3]:
import gdown
gdown.download('https://drive.google.com/uc?export=download&id=1z8gGKQ6oDoi2blCU2IVihKA38h5fORRp')
data_path = 'kang_normalized_hvg.h5ad'
adata = sc.read(data_path)
Downloading...
From (original): https://drive.google.com/uc?export=download&id=1z8gGKQ6oDoi2blCU2IVihKA38h5fORRp
From (redirected): https://drive.google.com/uc?export=download&id=1z8gGKQ6oDoi2blCU2IVihKA38h5fORRp&confirm=t&uuid=79a4319e-0cde-4eff-bf50-96c696868fc2
To: /content/kang_normalized_hvg.h5ad
100%|██████████| 545M/545M [00:05<00:00, 93.8MB/s]

Let’s examine the dataset structure to understand what we’re working with.

[4]:
adata
[4]:
AnnData object with n_obs × n_vars = 13576 × 5000
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'stim', 'seurat_annotations', 'integrated_snn_res.0.5', 'seurat_clusters', 'condition', 'cell_type', 'cov_cond', 'split_CD14 Mono', 'split_CD4 T', 'split_T', 'split_CD8 T', 'split_B', 'split_DC', 'split_CD16 Mono', 'split_NK'
    var: 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'symbol'
    uns: 'hvg', 'log1p', 'rank_genes_groups_cov'
    layers: 'counts'

Check the range of values in the data matrix to understand the data distribution.

[5]:
adata.X.min(), adata.X.max()
[5]:
(0.0, 9.887986)

1.3. Step 3: Data Preprocessing

For CellDISECT, we need to work with raw count data. The dataset has a ‘counts’ layer that we’ll use. We’ll first copy this to the main data matrix.

[6]:
adata.X = adata.layers['counts'].copy()
adata.X.min(), adata.X.max()
[6]:
(0.0, 3828.0)

Now we’ll normalize the data for visualization purposes. This involves:

  1. Normalizing each cell by its total count

  2. Log-transforming the data

[7]:
# Normalizing
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
WARNING: adata.X seems to be already log-transformed.

Let’s compute a neighborhood graph and UMAP embedding for visualization.

  • The neighborhood graph captures the similarity between cells

  • UMAP (Uniform Manifold Approximation and Projection) provides a 2D visualization of the high-dimensional data

[8]:
sc.pp.neighbors(adata)
sc.tl.umap(adata)
adata
[8]:
AnnData object with n_obs × n_vars = 13576 × 5000
    obs: 'orig.ident', 'nCount_RNA', 'nFeature_RNA', 'stim', 'seurat_annotations', 'integrated_snn_res.0.5', 'seurat_clusters', 'condition', 'cell_type', 'cov_cond', 'split_CD14 Mono', 'split_CD4 T', 'split_T', 'split_CD8 T', 'split_B', 'split_DC', 'split_CD16 Mono', 'split_NK'
    var: 'highly_variable', 'means', 'dispersions', 'dispersions_norm', 'symbol'
    uns: 'hvg', 'log1p', 'rank_genes_groups_cov', 'pca', 'neighbors', 'umap'
    obsm: 'X_pca', 'X_umap'
    varm: 'PCs'
    layers: 'counts'
    obsp: 'distances', 'connectivities'

Now we can visualize the data using UMAP, coloring cells by their condition and cell type. This helps us understand the dataset structure before modeling.

[9]:
sc.pl.umap(
    adata,
    color=['condition', 'cell_type'],  # Color by condition and cell type
    frameon=False,                     # Remove frame around the plot
    # legend_loc=None,                 # Uncomment to hide the legend
    wspace=0.2,                        # Width space between panels
    )
../_images/tutorials_CellDISECT_Counterfactual_19_0.png

1.4. Step 4: Model Configuration

Define the categorical covariates that we want to disentangle in our model. In this case, we’re interested in disentangling cell type and condition effects.

[10]:
cats = ['cell_type', 'condition']

Reset the data matrix to raw counts for model training, and remove cells with zero counts.

[11]:
adata.X = adata.layers['counts'].copy()
adata = adata[adata.X.sum(1) != 0].copy()
adata.X.min(), adata.X.max()
[11]:
(0.0, 3828.0)

1.5. Step 5: Model Configuration

Now we’ll set up the CellDISECT model configuration. This involves three main components:

  1. Architecture parameters (arch_dict): Define the neural network architecture

  2. Training parameters (train_dict): Control the training process

  3. Optimizer parameters (plan_kwargs): Configure the optimization process

We’ll also set up paths for saving the model.

[12]:
cell_type_included = True # Set to True if you have provided a cell type annotation in the cats list

# Set up the directory for saving models
module_name = 'Kang'
pre_path = f'drive/MyDrive/cellDISECT/models/{module_name}'
if not os.path.exists(pre_path):
    os.makedirs(pre_path)

# Architecture parameters
arch_dict = {'n_layers': 2,               # Number of hidden layers in encoder/decoder networks
 'n_hidden': 128,                         # Number of nodes per hidden layer
 'n_latent_shared': 32,                   # Dimensionality of the shared latent space (Z_0)
 'n_latent_attribute': 32,                # Dimensionality of each attribute-specific latent space (Z_i)
 'dropout_rate': 0.1,                     # Dropout rate for regularization
 'weighted_classifier': True,             # Whether to use weighted classifiers for imbalanced categories
}

# Training parameters
train_dict = {
 'max_epochs': 1000,                      # Maximum number of training epochs
 'batch_size': 256,                       # Number of samples per batch
 'recon_weight': 20,                      # Weight for reconstruction loss
 'cf_weight': 0.8,                        # Weight for counterfactual loss
 'beta': 0.003,                           # Weight for KL divergence (controls latent space regularization)
 'clf_weight': 0.05,                      # Weight for classifier loss
 'adv_clf_weight': 0.014,                 # Weight for adversarial classifier loss
 'adv_period': 5,                         # Period for adversarial training
 'n_cf': 1,                               # Number of counterfactual steps
 'early_stopping_patience': 6,            # Number of epochs to wait before early stopping
 'early_stopping': True,                  # Whether to use early stopping
 'save_best': True,                       # Whether to save the best model
 'kappa_optimizer2': False,               # Whether to use the kappa weight in optimizer 2
 'n_epochs_pretrain_ae': 0,               # Number of epochs for pretraining the autoencoder
}

# Optimizer parameters
plan_kwargs = {
 'lr': 0.003,                             # Learning rate
 'weight_decay': 0.00005,                 # L2 regularization strength
 'ensemble_method_cf': True,              # Whether to use ensemble method for counterfactuals
 'lr_patience': 5,                        # Patience for learning rate scheduler
 'lr_factor': 0.5,                        # Factor by which to reduce learning rate
 'lr_scheduler_metric': 'loss_validation',# Metric to monitor for learning rate scheduling
 'n_epochs_kl_warmup': 10,                # Number of epochs for KL divergence warmup
}

# Create a descriptive model name based on parameters
model_name = (
    f'pretrainAE_{train_dict["n_epochs_pretrain_ae"]}_'
    f'maxEpochs_{train_dict["max_epochs"]}_'
    f'reconW_{train_dict["recon_weight"]}_'
    f'cfWeight_{train_dict["cf_weight"]}_'
    f'beta_{train_dict["beta"]}_'
    f'clf_{train_dict["clf_weight"]}_'
    f'adv_{train_dict["adv_clf_weight"]}_'
    f'advp_{train_dict["adv_period"]}_'
    f'n_cf_{train_dict["n_cf"]}_'
    f'lr_{plan_kwargs["lr"]}_'
    f'wd_{plan_kwargs["weight_decay"]}_'
    f'ensemble_cf_{plan_kwargs["ensemble_method_cf"]}_'
    f'dropout_{arch_dict["dropout_rate"]}_'
    f'n_hidden_{arch_dict["n_hidden"]}_'
    f'n_latent_{arch_dict["n_latent_shared"]}_'
    f'n_layers_{arch_dict["n_layers"]}_'
    f'batch_size_{train_dict["batch_size"]}_'
    f'weighted_classifier_{arch_dict["weighted_classifier"]}_'
)
if cell_type_included:
    model_name = model_name + f'cellTypeIncluded'
else:
    model_name = model_name + f'cellTypeNotIncluded'

# Clean up existing model directory if it exists (caution: this will delete existing models)
try:
    shutil.rmtree(f"{pre_path}/{model_name}")
    print("Directory deleted successfully")
except OSError as e:
    print(f"Error deleting directory: {e}")
Error deleting directory: [Errno 2] No such file or directory: 'drive/MyDrive/cellDISECT/models/Kang/pretrainAE_0_maxEpochs_1000_reconW_20_cfWeight_0.8_beta_0.003_clf_0.05_adv_0.014_advp_5_n_cf_1_lr_0.003_wd_5e-05_ensemble_cf_True_dropout_0.1_n_hidden_128_n_latent_32_n_layers_2_batch_size_256_weighted_classifier_True_cellTypeIncluded'

1.6. Step 6: Setting up AnnData for CellDISECT

Before training, we need to set up the AnnData object for CellDISECT. This registers:

  • The layer containing count data

  • Categorical covariates (cell type and condition)

  • Whether to add cluster information as a covariate (useful when cell type annotations are unavailable)

[13]:
CellDISECT.setup_anndata(
    adata,
    layer='counts',                                  # Layer containing raw count data
    categorical_covariate_keys=cats,                 # Categorical covariates to disentangle
    continuous_covariate_keys=[],                    # Continuous covariates (none in this example)
    add_cluster_covariate=not cell_type_included,    # Add cluster info if cell type is not included
)

1.7. Step 7: Data Splitting

For proper evaluation, we need to split our data into training, validation, and test sets. In this example, we’re using a pre-defined split based on CD14 Monocytes, where stimulated cells are held out as out-of-distribution (OOD) samples.

Let’s check the distribution of cells in our split:

[14]:
adata.obs['split_CD14 Mono'].value_counts()
[14]:
count
split_CD14 Mono
train 10285
ood 2147
valid 1144

1.8. Step 8: Model Training

Now we’ll initialize and train the CellDISECT model. We can either:

  1. Use random splits (commented out)

  2. Use pre-defined splits (what we’re doing here)

We specify:

  • The split key in the AnnData object

  • Which values correspond to training, validation, and test sets

  • Architecture parameters from our earlier configuration

[15]:
## Use this to make random splits
# model = CellDISECT(adata,
#                    **arch_dict)
# Use this if you have pre-defined splits
split_key = 'split_CD14 Mono'
model = CellDISECT(adata,
                    split_key=split_key,              # Key in adata.obs containing split information
                    train_split=['train'],            # Values for training set
                    valid_split=['valid'],            # Values for validation set
                    test_split=['ood'],               # Values for test/OOD set
                    **arch_dict)                      # Architecture parameters

# Train the model with our training parameters
model.train(**train_dict, plan_kwargs=plan_kwargs, )

# Save the trained model
model.save(f"{pre_path}/{model_name}", overwrite=True)
print(model_name)
INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Epoch 00082: reducing learning rate of group 0 to 1.5000e-03.
Monitored metric loss_validation did not improve in the last 6 records. Best score: 3.399. Signaling Trainer to stop.
pretrainAE_0_maxEpochs_1000_reconW_20_cfWeight_0.8_beta_0.003_clf_0.05_adv_0.014_advp_5_n_cf_1_lr_0.003_wd_5e-05_ensemble_cf_True_dropout_0.1_n_hidden_128_n_latent_32_n_layers_2_batch_size_256_weighted_classifier_True_cellTypeIncluded

1.9. Step 9: Loading a Trained Model

If you’ve already trained a model, you can load it directly instead of training from scratch. This is useful for resuming analysis or sharing pre-trained models.

[16]:
pre_path = 'drive/MyDrive/cellDISECT/models/Kang/'
model_name = 'pretrainAE_0_maxEpochs_1000_reconW_20_cfWeight_0.8_beta_0.003_clf_0.05_adv_0.014_advp_5_n_cf_1_lr_0.003_wd_5e-05_ensemble_cf_True_dropout_0.1_n_hidden_128_n_latent_32_n_layers_2_batch_size_256_weighted_classifier_True_cellTypeIncluded'
model = CellDISECT.load(f"{pre_path}/{model_name}", adata=adata)
INFO     File
         drive/MyDrive/cellDISECT/models/Kang//pretrainAE_0_maxEpochs_1000_reconW_20_cfWeight_0.8_beta_0.003_clf_0.
         05_adv_0.014_advp_5_n_cf_1_lr_0.003_wd_5e-05_ensemble_cf_True_dropout_0.1_n_hidden_128_n_latent_32_n_layer
         s_2_batch_size_256_weighted_classifier_True_cellTypeIncluded/model.pt already downloaded

Check which device (CPU or GPU) the model is using:

[17]:
model.module.device
[17]:
device(type='cuda', index=0)

1.10. Step 10: Extracting Disentangled Latent Representations

One of the key features of CellDISECT is its ability to disentangle different sources of variation in the data. We’ll extract several types of latent representations:

  1. Z_0: The shared latent space that captures information not explained by any of the categorical covariates

  2. Z_i: The latent space specific to covariate i (e.g., cell_type or condition)

  3. Z_not_i: The latent space that contains all information except that related to covariate i

These different representations allow us to analyze the data from multiple perspectives.

[18]:
# Get the latent representations
print(f"Getting the latent 0...")
# Z_0: Shared latent space (nullify all categorical covariates)
adata.obsm[f"CellDISECT_Z_0"] = model.get_latent_representation(
    nullify_cat_covs_indices=[s for s in range(len(cats))], nullify_shared=False
)

for i in range(len(cats)):
    print(f"Getting the latent {i+1} / {len(cats)}...")
    null_idx = [s for s in range(len(cats)) if s != i]
    label = cats[i]
    # Z_i: Latent space specific to covariate i (nullify all other covariates and shared space)
    adata.obsm[f"CellDISECT_Z_{label}"] = model.get_latent_representation(
        nullify_cat_covs_indices=null_idx, nullify_shared=True
    )
    # Z_not_i: Latent space containing all information except covariate i
    adata.obsm[f"CellDISECT_Z_not_{label}"] = model.get_latent_representation(
        nullify_cat_covs_indices=[i], nullify_shared=False
    )

Getting the latent 0...
Getting the latent 1 / 2...
Getting the latent 2 / 2...

1.11. Step 11: Visualizing Latent Spaces

To understand the structure of our disentangled latent spaces, we’ll compute neighborhood graphs and UMAP embeddings for each latent representation. This allows us to visualize how cells are organized in each latent space.

[19]:
# Compute neighbors and UMAPs for the latent representations (this might take a while, consider running it using RAPIDS scanpy with a GPU if data is large)
for i in range(len(cats) + 1):  # loop over all Z_i | Neighbors and UMAPs for Z_i
    if i == 0:
        latent_name = f"CellDISECT_Z_{i}"
    else:
        label = cats[i - 1]
        latent_name = f"CellDISECT_Z_{label}"

    # Create a temporary AnnData object with the latent representation as the main matrix
    latent = ad.AnnData(X=adata.obsm[f"{latent_name}"], obs=adata.obs)
    sc.pp.neighbors(adata=latent, use_rep="X")
    sc.tl.umap(adata=latent)

    # Store the neighborhood graph and UMAP coordinates in the original AnnData object
    adata.uns[f"{latent_name}_neighbors"] = latent.uns["neighbors"]
    adata.obsm[f"{latent_name}_umap"] = latent.obsm["X_umap"]
    gc.collect()  # Free up memory

Now we’ll visualize each latent space using UMAP, coloring cells by their condition and cell type. This helps us understand what information is captured in each latent space.

[20]:
# Plotting Z_i
colors = cats
# colors = cats + ['any_other_obs_key']  # You can add other observation keys to visualize


for i in range(len(cats) + 1):  # loop over all Z_i
    if i == 0:
        latent_name = f'CellDISECT_Z_{i}'
    else:
        label = cats[i-1]
        latent_name = f'CellDISECT_Z_{label}'


    print(f"---UMAP for {latent_name}---")
    sc.set_figure_params(figsize=(12, 8))
    sc.pl.embedding(
        adata,
        f'{latent_name}_umap',
        color=colors,
        ncols=len(colors),
        frameon=False,
        # legend_loc=None,  # Uncomment to hide the legend
        # wspace=0.2,       # Width space between panels
    )

---UMAP for CellDISECT_Z_0---
../_images/tutorials_CellDISECT_Counterfactual_41_1.png
---UMAP for CellDISECT_Z_cell_type---
../_images/tutorials_CellDISECT_Counterfactual_41_3.png
---UMAP for CellDISECT_Z_condition---
../_images/tutorials_CellDISECT_Counterfactual_41_5.png

1.12. Step 12: Discovering Underlying Biological Structure

One of the advantages of CellDISECT is its ability to discover underlying biological structure that may not be explicitly annotated. Let’s create a new annotation for cell lineage (myeloid vs. lymphoid) and see if our model captures this structure in the shared latent space (Z_0).

[21]:
# Define cell types belonging to myeloid and lymphoid lineages
myeloid_lineage = ['CD14 Mono', 'CD16 Mono', 'DC']
lymphoid_lineage = ['CD4 T', 'CD8 T', 'T', 'B', 'NK']

# Create a new annotation for lineage
adata.obs.loc[adata.obs['cell_type'].isin(myeloid_lineage), 'lineage'] = 'myeloid'
adata.obs.loc[adata.obs['cell_type'].isin(lymphoid_lineage), 'lineage'] = 'lymphoid'
adata.obs['lineage'].head()
[21]:
lineage
index
AAACATACATTTCC.1 myeloid
AAACATACCAGAAA.1 myeloid
AAACATACCTCGCT.1 myeloid
AAACATACGATGAA.1 lymphoid
AAACATACGGCATT.1 myeloid

Let’s visualize the shared latent space (Z_0) again, this time including the lineage annotation. This will show us if the model has discovered this underlying biological structure.

[22]:
# Plotting Z_0 with lineage annotation
colors = cats
# colors = cats + ['any_other_obs_key']

i = 0
latent_name = f'CellDISECT_Z_{i}'


print(f"---UMAP for {latent_name}---")
# sc.set_figure_params(figsize=(12, 8))
sc.set_figure_params()
sc.pl.embedding(
    adata,
    f'{latent_name}_umap',
    color=colors + ['lineage'],  # Include lineage in the visualization
    ncols=len(colors)+1,
    frameon=False,
    # legend_loc=None,  # Uncomment to hide the legend
    wspace=0.3,
)

---UMAP for CellDISECT_Z_0---
../_images/tutorials_CellDISECT_Counterfactual_45_1.png

1.13. Step 13: Identifying Differentially Expressed Genes

Before making counterfactual predictions, let’s identify differentially expressed genes (DEGs) between control and stimulated conditions for each cell type. These DEGs will help us evaluate the quality of our counterfactual predictions.

[23]:
# Getting top Differentially Expressed Genes in each cell type with respect to control vs stimulated condition
adata.X = adata.layers['counts'].copy()
sc.pp.log1p(adata)
adata.obs['cov_condition'] = adata.obs['cell_type'].astype(str) + '_' + adata.obs.condition.astype(str)

# Parameters for differential expression analysis
groupby='cov_condition'           # Group cells by combined cell type and condition
control_group='ctrl'              # The control condition
key_added="rank_genes_groups"     # Key to store results
n_genes=200                       # Number of top genes to return
return_dict=False                 # Whether to return results as a dictionary
rankby_abs=True                   # Whether to rank by absolute values
gene_dict = {}                    # Dictionary to store DEGs

# Perform differential expression analysis for each cell type
covariate = 'cell_type'
cov_categories = adata.obs[covariate].unique()
for cov_cat in cov_categories:
    print(cov_cat)
    # Name of the control group in the groupby obs column
    control_group_cov = "_".join([cov_cat, control_group])
    adata_cov = adata[adata.obs[covariate] == cov_cat]

    # Perform Wilcoxon rank-sum test for differential expression
    sc.tl.rank_genes_groups(
        adata_cov,
        groupby=groupby,
        reference=control_group_cov,
        rankby_abs=rankby_abs,
        n_genes=n_genes,
        use_raw=False,
        method='wilcoxon',
    )

    # Extract results
    de_genes_groups = pd.DataFrame(adata_cov.uns["rank_genes_groups"]["names"]).columns

    de_genes = {}
    lfc = {}
    for group in de_genes_groups:
        # Get differentially expressed genes with p-value < 0.05
        de_genes[group] = sc.get.rank_genes_groups_df(adata_cov, group, key='rank_genes_groups', pval_cutoff=0.05, log2fc_min=None, log2fc_max=None)['names']
        lfc[group] = sc.get.rank_genes_groups_df(adata_cov, group, key='rank_genes_groups', pval_cutoff=0.05, log2fc_min=None, log2fc_max=None)['logfoldchanges']

        # Sort genes by absolute log fold change
        lfc_indices = lfc[group].abs().sort_values(ascending=False).index
        de_genes[group] = de_genes[group][lfc_indices].reset_index(drop=True)

        gene_dict[group] = de_genes[group].tolist()

# Store the results in the AnnData object
adata.uns[key_added] = gene_dict
adata.X = adata.layers['counts'].copy()
WARNING: adata.X seems to be already log-transformed.
CD14 Mono
CD4 T
T
CD8 T
B
DC
CD16 Mono
NK

1.14. Step 14: Counterfactual Predictions

Now we’ll use CellDISECT to make counterfactual predictions. In this example, we’ll predict how CD14 Monocytes would respond to interferon-beta stimulation.

Note that CD14 Monocytes in the stimulated condition were held out during training (they’re in the OOD test set), so this is a true counterfactual prediction.

[24]:
# Check that we have CD14 Monocytes in both conditions
adata[adata.obs['cell_type'] == 'CD14 Mono'].obs['condition'].value_counts()
[24]:
count
condition
ctrl 2215
stimulated 2147

Now we’ll generate counterfactual predictions for CD14 Monocytes:

  • x_ctrl: The original control cells

  • x_true: The actual stimulated cells (ground truth)

  • x_pred: Our counterfactual prediction of how control cells would look if stimulated

[25]:
x_ctrl, x_true, x_pred = model.predict_counterfactuals(
    adata[adata.obs['cell_type'] == 'CD14 Mono'].copy(), # We want to change CD14 Monocytes
    cov_names = ['condition'],                           # We want to change their condition
    cov_values = ['ctrl'],                               # We want to change the control cells
    cov_values_cf = ['stimulated'],                      # We want to change them to stimulated
    cats = cats,                                         # List of categorical covariates
    n_samples_from_source = None,                        # Number of samples to use (None = all)
    seed = 42,
)
x_ctrl, x_true, x_pred = np.log1p(x_ctrl), np.log1p(x_true), np.log1p(x_pred)
INFO     AnnData object appears to be a copy. Attempting to transfer setup.
INFO     AnnData object appears to be a copy. Attempting to transfer setup.

Let’s check the shapes of our data to make sure everything is as expected:

[26]:
x_ctrl.shape, x_true.shape, x_pred.shape
[26]:
(torch.Size([2215, 5000]), torch.Size([2147, 5000]), torch.Size([2215, 5000]))

1.15. Step 15: Evaluating Counterfactual Predictions

Now we’ll evaluate the quality of our counterfactual predictions by comparing them to the ground truth. We’ll calculate Pearson correlations between:

  1. The predicted and true expression profiles

  2. The predicted and true differential expression (delta) compared to control

We’ll do this for both the top differentially expressed genes and all genes.

[27]:
# Get the list of differentially expressed genes for CD14 Monocytes
deg_list = adata.uns["rank_genes_groups"][f'CD14 Mono_stimulated']

# Evaluate predictions for top DEGs and all genes
for n_top_deg in [20, None]:
    if n_top_deg is not None:
        # Select top DEGs
        degs = np.where(np.isin(adata.var_names, deg_list[:n_top_deg]))[0]
    else:
        # Use all genes
        degs = np.arange(adata.n_vars)

    # Extract expression values for the selected genes
    x_true_deg = x_true[:, degs]
    x_pred_deg = x_pred[:, degs]
    x_ctrl_deg = x_ctrl[:, degs]

    # Calculate Pearson correlation between predicted and true expression
    pearson_mean_deg = pearsonr(x_true_deg.mean(0), x_pred_deg.mean(0))

    # Calculate Pearson correlation between predicted and true differential expression
    deltaPearson_mean_deg = pearsonr(x_true_deg.mean(0) - x_ctrl_deg.mean(0), x_pred_deg.mean(0) - x_ctrl_deg.mean(0))

    # Print results
    if n_top_deg is not None:
        print(f'Top {n_top_deg} DEGs:')
    else:
        print(f'All highly varaible genes ({adata.shape[1]}):')

    print(f"Pearson correlation: {pearson_mean_deg[0]:.3f}")
    print(f"Delta Pearson correlation: {deltaPearson_mean_deg[0]:.3f}")
    print()
Top 20 DEGs:
Pearson correlation: 0.905
Delta Pearson correlation: 0.894

All highly varaible genes (5000):
Pearson correlation: 0.915
Delta Pearson correlation: 0.778

1.16. Step 16: Visualizing Counterfactual Predictions

Finally, let’s visualize our counterfactual predictions using a dot plot. This will show the expression of top differentially expressed genes in:

  • Control cells (ctrl)

  • True stimulated cells (true)

  • Our counterfactual predictions (x_CellDISECT)

[28]:
# Create an AnnData object with our three conditions
obs = ['ctrl']*x_ctrl.shape[0] + ['true']*x_true.shape[0] + ['x_CellDISECT']*x_pred.shape[0]
from anndata import AnnData
import torch
results_adata = AnnData(X=torch.concat([x_ctrl, x_true, x_pred]).numpy(),
                        obs={'source': obs},
                        var=adata.var)

Create a dot plot showing gene expression across the three conditions Each dot’s size represents the percentage of cells expressing the gene Each dot’s color represents the average expression level

[29]:
sc.pl.dotplot(results_adata, var_names=deg_list[:20], groupby="source", show=True, swap_axes=False)
../_images/tutorials_CellDISECT_Counterfactual_59_0.png

1.17. Conclusion

In this tutorial, we’ve demonstrated how to use CellDISECT for counterfactual analysis in single-cell RNA sequencing data. We’ve:

  1. Trained a CellDISECT model on the Kang dataset

  2. Extracted disentangled latent representations

  3. Generated counterfactual predictions for CD14 Monocytes

  4. Evaluated the quality of our predictions

CellDISECT successfully predicted how control CD14 Monocytes would respond to interferon-beta stimulation, as evidenced by the high correlation between our predictions and the ground truth. This demonstrates the power of CellDISECT for understanding cellular responses and generating hypotheses about cell behavior under different conditions.