skip to main content


This content will become publicly available on August 17, 2024

Title: Predicting transcriptional outcomes of novel multigene perturbations with GEARS
Abstract

Understanding cellular responses to genetic perturbation is central to numerous biomedical applications, from identifying genetic interactions involved in cancer to developing methods for regenerative medicine. However, the combinatorial explosion in the number of possible multigene perturbations severely limits experimental interrogation. Here, we present graph-enhanced gene activation and repression simulator (GEARS), a method that integrates deep learning with a knowledge graph of gene–gene relationships to predict transcriptional responses to both single and multigene perturbations using single-cell RNA-sequencing data from perturbational screens. GEARS is able to predict outcomes of perturbing combinations consisting of genes that were never experimentally perturbed. GEARS exhibited 40% higher precision than existing approaches in predicting four distinct genetic interaction subtypes in a combinatorial perturbation screen and identified the strongest interactions twice as well as prior approaches. Overall, GEARS can predict phenotypically distinct effects of multigene perturbations and thus guide the design of perturbational experiments.

 
more » « less
Award ID(s):
1835598 1918940
NSF-PAR ID:
10471870
Author(s) / Creator(s):
; ;
Publisher / Repository:
Nature
Date Published:
Journal Name:
Nature Biotechnology
ISSN:
1087-0156
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    The integrated responses of biological systems to genetic and environmental variation result in substantial covariance in multiple phenotypes. The resultant pleiotropy, environmental effects, and genotype‐by‐environmental interactions (GxE) are foundational to our understanding of biology and genetics. Yet, the treatment of correlated characters, and the identification of the genes encoding functions that generate this covariance, has lagged. As a test case for analyzing the genetic basis underlying multiple correlated traits, we analyzed maize kernel ionomes from Intermated B73 x Mo17 (IBM) recombinant inbred populations grown in 10 environments. Plants obtain elements from the soil through genetic and biochemical pathways responsive to physiological state and environment. Most perturbations affect multiple elements which leads theionome, the full complement of mineral nutrients in an organism, to vary as an integrated network rather than a set of distinct single elements. We compared quantitative trait loci (QTL) determining single‐element variation toQTLthat predict variation in principal components (PCs) of multiple‐element covariance. Single‐element and multivariate approaches detected partially overlapping sets of loci.QTLinfluencing trait covariation were detected at loci that were not found by mapping single‐element traits. Moreover, this approach permitted testing environmental components of trait covariance, and identified multi‐element traits that were determined by both genetic and environmental factors as well as genotype‐by‐environment interactions. Growth environment had a profound effect on the elemental profiles and multi‐element phenotypes were significantly correlated with specific environmental variables.

     
    more » « less
  2. INTRODUCTION Genome-wide association studies (GWASs) have identified thousands of human genetic variants associated with diverse diseases and traits, and most of these variants map to noncoding loci with unknown target genes and function. Current approaches to understand which GWAS loci harbor causal variants and to map these noncoding regulators to target genes suffer from low throughput. With newer multiancestry GWASs from individuals of diverse ancestries, there is a pressing and growing need to scale experimental assays to connect GWAS variants with molecular mechanisms. Here, we combined biobank-scale GWASs, massively parallel CRISPR screens, and single-cell sequencing to discover target genes of noncoding variants for blood trait loci with systematic targeting and inhibition of noncoding GWAS loci with single-cell sequencing (STING-seq). RATIONALE Blood traits are highly polygenic, and GWASs have identified thousands of noncoding loci that map to candidate cis -regulatory elements (CREs). By combining CRE-silencing CRISPR perturbations and single-cell readouts, we targeted hundreds of GWAS loci in a single assay, revealing target genes in cis and in trans . For select CREs that regulate target genes, we performed direct variant insertion. Although silencing the CRE can identify the target gene, direct variant insertion can identify magnitude and direction of effect on gene expression for the GWAS variant. In select cases in which the target gene was a transcription factor or microRNA, we also investigated the gene-regulatory networks altered upon CRE perturbation and how these networks differ across blood cell types. RESULTS We inhibited candidate CREs from fine-mapped blood trait GWAS variants (from ~750,000 individual of diverse ancestries) in human erythroid progenitors. In total, we targeted 543 variants (254 loci) mapping to candidate CREs, generating multimodal single-cell data including transcriptome, direct CRISPR gRNA capture, and cell surface proteins. We identified target genes in cis (within 500 kb) for 134 CREs. In most cases, we found that the target gene was the closest gene and that specific enhancer-associated biochemical hallmarks (H3K27ac and accessible chromatin) are essential for CRE function. Using multiple perturbations at the same locus, we were able to distinguished between causal variants from noncausal variants in linkage disequilibrium. For a subset of validated CREs, we also inserted specific GWAS variants using base-editing STING-seq (beeSTING-seq) and quantified the effect size and direction of GWAS variants on gene expression. Given our transcriptome-wide data, we examined dosage effects in cis and trans in cases in which the cis target is a transcription factor or microRNA. We found that trans target genes are also enriched for GWAS loci, and identified gene clusters within trans gene networks with distinct biological functions and expression patterns in primary human blood cells. CONCLUSION In this work, we investigated noncoding GWAS variants at scale, identifying target genes in single cells. These methods can help to address the variant-to-function challenges that are a barrier for translation of GWAS findings (e.g., drug targets for diseases with a genetic basis) and greatly expand our ability to understand mechanisms underlying GWAS loci. Identifying causal variants and their target genes with STING-seq. Uncovering causal variants and their target genes or function are a major challenge for GWASs. STING-seq combines perturbation of noncoding loci with multimodal single-cell sequencing to profile hundreds of GWAS loci in parallel. This approach can identify target genes in cis and trans , measure dosage effects, and decipher gene-regulatory networks. 
    more » « less
  3. Abstract

    Large-scale multiple perturbation experiments have the potential to reveal a more detailed understanding of the molecular pathways that respond to genetic and environmental changes. A key question in these studies is which gene expression changes are important for the response to the perturbation. This problem is challenging because (i) the functional form of the nonlinear relationship between gene expression and the perturbation is unknown and (ii) identification of the most important genes is a high-dimensional variable selection problem. To deal with these challenges, we present here a method based on the model-X knockoffs framework and Deep Neural Networks to identify significant gene expression changes in multiple perturbation experiments. This approach makes no assumptions on the functional form of the dependence between the responses and the perturbations and it enjoys finite sample false discovery rate control for the selected set of important gene expression responses. We apply this approach to the Library of Integrated Network-Based Cellular Signature data sets which is a National Institutes of Health Common Fund program that catalogs how human cells globally respond to chemical, genetic and disease perturbations. We identified important genes whose expression is directly modulated in response to perturbation with anthracycline, vorinostat, trichostatin-a, geldanamycin and sirolimus. We compare the set of important genes that respond to these small molecules to identify co-responsive pathways. Identification of which genes respond to specific perturbation stressors can provide better understanding of the underlying mechanisms of disease and advance the identification of new drug targets.

     
    more » « less
  4. Abstract

    The ongoing diversification of plant defence compounds exerts dynamic selection pressures on the microorganisms that colonize plant tissues. Evolutionary processes that generate resistance towards these compounds increase microbial fitness by giving access to plant resources and increasing pathogen virulence. These processes entail sequence‐based mechanisms that result in adaptive gene functions, and combinatorial mechanisms that result in novel syntheses of existing gene functions. However, the priority and interactions among these processes in adaptive resistance remain poorly understood. Using a combination of molecular genetic and computational approaches, we investigated the contributions of sequence‐based and combinatorial processes to the evolution of fungal metabolic gene clusters encoding stilbene cleavage oxygenases (SCOs), which catalyse the degradation of biphenolic plant defence compounds known as stilbenes into monophenolic molecules. We present phylogenetic evidence of convergent assembly among three distinct types of SCO gene clusters containing alternate combinations of phenolic catabolism. Multiple evolutionary transitions between different cluster types suggest recurrent selection for distinct gene assemblages. By comparison, we found that the substrate specificities of heterologously expressed SCO enzymes encoded in different clusters types were all limited to stilbenes and related molecules with a 4′‐OH group, and differed modestly in substrate range and activity under the experimental conditions. Together, this work suggests a primary role for genome structural rearrangement, and the importance of enzyme modularity, in promoting fungal metabolic adaptation to plant defence chemistry.

     
    more » « less
  5. Abstract

    Clustered regularly interspaced short palindromic repeats (CRISPR) screening coupled with single-cell RNA sequencing has emerged as a powerful tool to characterize the effects of genetic perturbations on the whole transcriptome at a single-cell level. However, due to its sparsity and complex structure, analysis of single-cell CRISPR screening data is challenging. In particular, standard differential expression analysis methods are often underpowered to detect genes affected by CRISPR perturbations. We developed a statistical method for such data, called guided sparse factor analysis (GSFA). GSFA infers latent factors that represent coregulated genes or gene modules; by borrowing information from these factors, it infers the effects of genetic perturbations on individual genes. We demonstrated through extensive simulation studies that GSFA detects perturbation effects with much higher power than state-of-the-art methods. Using single-cell CRISPR data from human CD8+T cells and neural progenitor cells, we showed that GSFA identified biologically relevant gene modules and specific genes affected by CRISPR perturbations, many of which were missed by existing methods, providing new insights into the functions of genes involved in T cell activation and neurodevelopment.

     
    more » « less