Tutorials
Welcome to the CellDISECT tutorials section. Here you'll find comprehensive guides and practical examples that will help you master CellDISECT's capabilities for single-cell analysis and counterfactual predictions.
Getting Started
Before diving into the tutorials, make sure you have:
Familiarized yourself with the basic concepts
Prepared your single-cell data in the appropriate format
Tutorial Categories
Beginner Tutorials
Learn how to train CellDISECT and make counterfactual predictions using the Kang dataset. Perfect for first-time users!
Perturbation Prediction
Predict gene expression under seen, unseen, and combinatorial perturbations using predefined gene embeddings (GenePT, ESM, scGPT).
Advanced Applications
Explore how to combine CellDISECT latent spaces for erythroid subset inference, demonstrating advanced usage with Z_0 + Z_Organ integration.
Advanced tutorial recreating Scenario 2 counterfactual predictions on the Eraslan dataset, as featured in our paper.
Note
Each tutorial includes downloadable Jupyter notebooks that you can run locally. The notebooks are extensively documented with step-by-step explanations and best practices.
Detailed Tutorial Contents
Available Tutorials
- 1. CellDISECT Counterfactual Analysis
- 1.1. Step 1: Installation and Setup
- 1.2. Step 2: Data Loading and Exploration
- 1.3. Step 3: Data Preprocessing
- 1.4. Step 4: Model Configuration
- 1.5. Step 5: Model Configuration
- 1.6. Step 6: Setting up AnnData for CellDISECT
- 1.7. Step 7: Data Splitting
- 1.8. Step 8: Model Training
- 1.9. Step 9: Loading a Trained Model
- 1.10. Step 10: Extracting Disentangled Latent Representations
- 1.11. Step 11: Visualizing Latent Spaces
- 1.12. Step 12: Discovering Underlying Biological Structure
- 1.13. Step 13: Identifying Differentially Expressed Genes
- 1.14. Step 14: Counterfactual Predictions
- 1.15. Step 15: Evaluating Counterfactual Predictions
- 1.16. Step 16: Visualizing Counterfactual Predictions
- 1.17. Conclusion
- 2. Perturbation Prediction with CellDISECT
- 2.1. 0. Setup & Data Download
- 2.2. 1. Data Preparation
- 2.3. 2. Model Setup and Training
- 2.4. 3. Predicting a Single Seen Perturbation
- 2.5. 4. Predicting a Single Unseen Perturbation
- 2.6. 5. Predicting an Unseen Combinatorial Perturbation
- 2.7. 6. Batch Evaluation with
predict_perturbations - 2.8. 7. Visualization
- 2.9. 8. Full metrics table
- 3. Flexible fairness for batch correction
- 3.1. Latent combination tutorial for erythroid organ inference
- 3.2. You can get the latents related to each of the covariates separately
- 3.3. You can also use the combination of covariates/latents as you like, here we are using a combination/concatenation of Z0 with each of the covariates to make new latents Z0+ZCov
- 3.4. Now we want to take a closer look at Liver and Bone Marrow cells
- 4. Double counterfactual prediction
- 4.1. We will be performing a double counterfactual prediction in this tutorial, recreating the CellDISECT results from benchmarking scenario 2 of the paper on the Eraslan et al. data.
- 4.2. We have already trained the model without cell type information using the default parameters as in the example:
- 4.3. We are aiming to predict, what would Epithelial female breast cells look like, given if they were male prostate gland cells.
- 4.4. Earth Mover’s Distance metric on 20 DEGs and all genes
- 4.5. Pearson Correlation metric on 20 DEGs and all genes
- 4.6. Delta Pearson Correlation metric on 20 DEGs and all genes
Tutorial Details
Basic Training Tutorial
In this introductory tutorial, you’ll learn:
How to prepare your data for CellDISECT
Basic model training and configuration
Making simple counterfactual predictions
Visualizing and interpreting results
Perturbation Prediction Tutorial
This tutorial covers:
Preparing predefined gene embeddings (GenePT, ESM) in
adata.unsSetting up
setup_anndatawithperturbation_keyandperturbation_embedding_keyTraining CellDISECT with perturbation-aware embeddings
Predicting seen, unseen, and combinatorial perturbations
Evaluating predictions with
perturbation_metrics
Latent Space Analysis
This advanced tutorial covers:
Understanding CellDISECT’s latent space structure
Combining multiple latent spaces (Z_0 + Z_Organ)
Advanced visualization techniques
Interpreting latent space representations
Double Counterfactual Tutorial
This expert-level tutorial demonstrates:
Complex counterfactual predictions
Recreating paper results on the Eraslan dataset
Advanced model configurations
Result analysis and validation