skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on December 1, 2025

Title: Cellograph: a semi-supervised approach to analyzing multi-condition single-cell RNA-sequencing data using graph neural networks
Abstract With the growing number of single-cell datasets collected under more complex experimental conditions, there is an opportunity to leverage single-cell variability to reveal deeper insights into how cells respond to perturbations. Many existing approaches rely on discretizing the data into clusters for differential gene expression (DGE), effectively ironing out any information unveiled by the single-cell variability across cell-types. In addition, DGE often assumes a statistical distribution that, if erroneous, can lead to false positive differentially expressed genes. Here, we present Cellograph: a semi-supervised framework that uses graph neural networks to quantify the effects of perturbations at single-cell granularity. Cellograph not only measures how prototypical cells are of each condition but also learns a latent space that is amenable to interpretable data visualization and clustering. The learned gene weight matrix from training reveals pertinent genes driving the differences between conditions. We demonstrate the utility of our approach on publicly-available datasets including cancer drug therapy, stem cell reprogramming, and organoid differentiation. Cellograph outperforms existing methods for quantifying the effects of experimental perturbations and offers a novel framework to analyze single-cell data using deep learning.  more » « less
Award ID(s):
2242980
PAR ID:
10491782
Author(s) / Creator(s):
; ;
Publisher / Repository:
BMC Bioinformatics
Date Published:
Journal Name:
BMC bioinformatics
Volume:
25
Issue:
1
ISSN:
1471-2105
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Clustered regularly interspaced short palindromic repeats (CRISPR) screening coupled with single-cell RNA sequencing has emerged as a powerful tool to characterize the effects of genetic perturbations on the whole transcriptome at a single-cell level. However, due to its sparsity and complex structure, analysis of single-cell CRISPR screening data is challenging. In particular, standard differential expression analysis methods are often underpowered to detect genes affected by CRISPR perturbations. We developed a statistical method for such data, called guided sparse factor analysis (GSFA). GSFA infers latent factors that represent coregulated genes or gene modules; by borrowing information from these factors, it infers the effects of genetic perturbations on individual genes. We demonstrated through extensive simulation studies that GSFA detects perturbation effects with much higher power than state-of-the-art methods. Using single-cell CRISPR data from human CD8+T cells and neural progenitor cells, we showed that GSFA identified biologically relevant gene modules and specific genes affected by CRISPR perturbations, many of which were missed by existing methods, providing new insights into the functions of genes involved in T cell activation and neurodevelopment. 
    more » « less
  2. Pooled single-cell perturbation screens represent powerful experimental platforms for functional genomics, yet interpreting these rich datasets for meaningful biological conclusions remains challenging. Most current methods fall at one of two extremes: either opaque deep learning models that obscure biological meaning, or simplified frameworks that treat genes as isolated units. As such, these approaches overlook a crucial insight: gene co-fluctuations in unperturbed cellular states can be harnessed to model perturbation responses. Here we present CIPHER (Covariance Inference for Perturbation and High-dimensional Expression Response), a framework leveraging linear response theory from statistical physics to predict transcriptome-wide perturbation outcomes using gene co-fluctuations in unperturbed cells. We validated CIPHER on synthetic regulatory networks before applying it to 11 large-scale single-cell perturbation datasets covering 4,234 perturbations and over 1.36M cells. CIPHER robustly recapitulated genome-wide responses to single and double perturbations by exploiting baseline gene covariance structure. Importantly, eliminating gene-gene covariances, while retaining gene-intrinsic variances, reduced model performance by 11-fold, demonstrating the rich information stored within baseline fluctuation structures. Moreover, gene-gene correlations transferred successfully across independent experiments of the same cell type, revealing stereotypic fluctuation structures. Furthermore, CIPHER outperformed conventional differential expression metrics in identifying true perturbations while providing uncertainty-aware effect size estimates through Bayesian inference. Finally, most genome-wide responses propagated through the covariance matrix along approximately three independent and global gene modules. CIPHER underscores the importance of theoretically-grounded models in capturing complex biological responses, highlighting fundamental design principles encoded in cellular fluctuation patterns. 
    more » « less
  3. Abstract Understanding cellular responses to genetic perturbation is central to numerous biomedical applications, from identifying genetic interactions involved in cancer to developing methods for regenerative medicine. However, the combinatorial explosion in the number of possible multigene perturbations severely limits experimental interrogation. Here, we present graph-enhanced gene activation and repression simulator (GEARS), a method that integrates deep learning with a knowledge graph of gene–gene relationships to predict transcriptional responses to both single and multigene perturbations using single-cell RNA-sequencing data from perturbational screens. GEARS is able to predict outcomes of perturbing combinations consisting of genes that were never experimentally perturbed. GEARS exhibited 40% higher precision than existing approaches in predicting four distinct genetic interaction subtypes in a combinatorial perturbation screen and identified the strongest interactions twice as well as prior approaches. Overall, GEARS can predict phenotypically distinct effects of multigene perturbations and thus guide the design of perturbational experiments. 
    more » « less
  4. Nie, Qing (Ed.)
    Single-cell RNA sequencing technology provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses in single-cell transcriptomic studies. We propose a new method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and ten existing imputation methods to eight single-cell transcriptomic datasets and compared their performance. Our results demonstrated that G2S3 has superior overall performance in recovering gene expression, identifying cell subtypes, reconstructing cell trajectories, identifying differentially expressed genes, and recovering gene regulatory and correlation relationships. Moreover, G2S3 is computationally efficient for imputation in large-scale single-cell transcriptomic datasets. 
    more » « less
  5. Bacterial populations typically exhibit exponential growth under resource-rich conditions, yet individual cells often deviate from this pattern. Recent work has shown that the elongation rates of and increase throughout the cell cycle (super-exponential growth), while displays a midcycle minimum (convex growth), and grows linearly. Here, we develop a single-cell model linking gene expression, proteome allocation, and mass growth to explain these diverse growth trajectories. By calibrating model parameters with experimental data, we show that DNA-proportional mRNA transcription produces near-exponential growth, whereas deviations from this proportionality yield the observed non-exponential growth patterns. Analysis of gene expression perturbations reveals that ribosome expression primarily controls dry mass growth rate, whereas cell envelope protein expression more strongly affects cell elongation rate. We show that cell-cycle-dependent transcription dynamics give rise to convex, super-exponential, and linear modes of cell elongation observed experimentally, demonstrating how the timing of cell envelope and ribosomal protein expressions drive cell-cycle-specific behaviors. These findings provide a mechanistic basis for non-exponential single-cell growth and offer insights into how bacterial cells dynamically regulate elongation rates within each generation. 
    more » « less