skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Predicting transcriptional outcomes of novel multigene perturbations with GEARS
Abstract Understanding cellular responses to genetic perturbation is central to numerous biomedical applications, from identifying genetic interactions involved in cancer to developing methods for regenerative medicine. However, the combinatorial explosion in the number of possible multigene perturbations severely limits experimental interrogation. Here, we present graph-enhanced gene activation and repression simulator (GEARS), a method that integrates deep learning with a knowledge graph of gene–gene relationships to predict transcriptional responses to both single and multigene perturbations using single-cell RNA-sequencing data from perturbational screens. GEARS is able to predict outcomes of perturbing combinations consisting of genes that were never experimentally perturbed. GEARS exhibited 40% higher precision than existing approaches in predicting four distinct genetic interaction subtypes in a combinatorial perturbation screen and identified the strongest interactions twice as well as prior approaches. Overall, GEARS can predict phenotypically distinct effects of multigene perturbations and thus guide the design of perturbational experiments.  more » « less
Award ID(s):
1835598 1918940
PAR ID:
10471870
Author(s) / Creator(s):
; ;
Publisher / Repository:
Nature
Date Published:
Journal Name:
Nature Biotechnology
ISSN:
1087-0156
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Pooled single-cell perturbation screens represent powerful experimental platforms for functional genomics, yet interpreting these rich datasets for meaningful biological conclusions remains challenging. Most current methods fall at one of two extremes: either opaque deep learning models that obscure biological meaning, or simplified frameworks that treat genes as isolated units. As such, these approaches overlook a crucial insight: gene co-fluctuations in unperturbed cellular states can be harnessed to model perturbation responses. Here we present CIPHER (Covariance Inference for Perturbation and High-dimensional Expression Response), a framework leveraging linear response theory from statistical physics to predict transcriptome-wide perturbation outcomes using gene co-fluctuations in unperturbed cells. We validated CIPHER on synthetic regulatory networks before applying it to 11 large-scale single-cell perturbation datasets covering 4,234 perturbations and over 1.36M cells. CIPHER robustly recapitulated genome-wide responses to single and double perturbations by exploiting baseline gene covariance structure. Importantly, eliminating gene-gene covariances, while retaining gene-intrinsic variances, reduced model performance by 11-fold, demonstrating the rich information stored within baseline fluctuation structures. Moreover, gene-gene correlations transferred successfully across independent experiments of the same cell type, revealing stereotypic fluctuation structures. Furthermore, CIPHER outperformed conventional differential expression metrics in identifying true perturbations while providing uncertainty-aware effect size estimates through Bayesian inference. Finally, most genome-wide responses propagated through the covariance matrix along approximately three independent and global gene modules. CIPHER underscores the importance of theoretically-grounded models in capturing complex biological responses, highlighting fundamental design principles encoded in cellular fluctuation patterns. 
    more » « less
  2. Irreversibility, in which a transient perturbation leaves a system in a new state, is an emergent property in systems of interacting entities. This property has well-established implications in statistical physics but remains underexplored in biological networks, especially for bacteria and other prokaryotes whose regulation of gene expression occurs predominantly at the transcriptional level. Focusing on the reconstructed regulatory network ofEscherichia coli, we examine network responses to transient single-gene perturbations. We predict irreversibility in numerous cases and find that the incidence of irreversibility increases with the proximity of the perturbed gene to positive circuits in the network. Comparison with experimental data suggests a connection between the predicted irreversibility to transient perturbations and the evolutionary response to permanent perturbations. 
    more » « less
  3. Abstract Clustered regularly interspaced short palindromic repeats (CRISPR) screening coupled with single-cell RNA sequencing has emerged as a powerful tool to characterize the effects of genetic perturbations on the whole transcriptome at a single-cell level. However, due to its sparsity and complex structure, analysis of single-cell CRISPR screening data is challenging. In particular, standard differential expression analysis methods are often underpowered to detect genes affected by CRISPR perturbations. We developed a statistical method for such data, called guided sparse factor analysis (GSFA). GSFA infers latent factors that represent coregulated genes or gene modules; by borrowing information from these factors, it infers the effects of genetic perturbations on individual genes. We demonstrated through extensive simulation studies that GSFA detects perturbation effects with much higher power than state-of-the-art methods. Using single-cell CRISPR data from human CD8+T cells and neural progenitor cells, we showed that GSFA identified biologically relevant gene modules and specific genes affected by CRISPR perturbations, many of which were missed by existing methods, providing new insights into the functions of genes involved in T cell activation and neurodevelopment. 
    more » « less
  4. Abstract The ongoing diversification of plant defence compounds exerts dynamic selection pressures on the microorganisms that colonize plant tissues. Evolutionary processes that generate resistance towards these compounds increase microbial fitness by giving access to plant resources and increasing pathogen virulence. These processes entail sequence‐based mechanisms that result in adaptive gene functions, and combinatorial mechanisms that result in novel syntheses of existing gene functions. However, the priority and interactions among these processes in adaptive resistance remain poorly understood. Using a combination of molecular genetic and computational approaches, we investigated the contributions of sequence‐based and combinatorial processes to the evolution of fungal metabolic gene clusters encoding stilbene cleavage oxygenases (SCOs), which catalyse the degradation of biphenolic plant defence compounds known as stilbenes into monophenolic molecules. We present phylogenetic evidence of convergent assembly among three distinct types of SCO gene clusters containing alternate combinations of phenolic catabolism. Multiple evolutionary transitions between different cluster types suggest recurrent selection for distinct gene assemblages. By comparison, we found that the substrate specificities of heterologously expressed SCO enzymes encoded in different clusters types were all limited to stilbenes and related molecules with a 4′‐OH group, and differed modestly in substrate range and activity under the experimental conditions. Together, this work suggests a primary role for genome structural rearrangement, and the importance of enzyme modularity, in promoting fungal metabolic adaptation to plant defence chemistry. 
    more » « less
  5. Abstract Single-cell CRISPR screens (perturb-seq) link genetic perturbations to phenotypic changes in individual cells. The most fundamental task in perturb-seq analysis is to test for association between a perturbation and a count outcome, such as gene expression. We conduct the first-ever comprehensive benchmarking study of association testing methods for low multiplicity-of-infection (MOI) perturb-seq data, finding that existing methods produce excess false positives. We conduct an extensive empirical investigation of the data, identifying three core analysis challenges: sparsity, confounding, and model misspecification. Finally, we develop an association testing method — SCEPTRE low-MOI — that resolves these analysis challenges and demonstrates improved calibration and power. 
    more » « less