skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Decoding biology with massively parallel reporter assays and machine learning
Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding ofcis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses oncis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.  more » « less
Award ID(s):
2021552
PAR ID:
10564624
Author(s) / Creator(s):
; ;
Publisher / Repository:
Cold Spring Harbor Laboratory Press
Date Published:
Journal Name:
Genes & Development
Volume:
38
Issue:
17-20
ISSN:
0890-9369
Page Range / eLocation ID:
843 to 865
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Genomic deep learning models can predict genome-wide epigenetic features and gene expression levels directly from DNA sequence. While current models perform well at predicting gene expression levels across genes in different cell types from the reference genome, their ability to explain expression variation between individuals due tocis-regulatory genetic variants remains largely unexplored. Here, we evaluate four state-of-the-art models on paired personal genome and transcriptome data and find limited performance when explaining variation in expression across individuals. In addition, models often fail to predict the correct direction of effect ofcis-regulatory genetic variation on expression. 
    more » « less
  2. Hake, Sarah (Ed.)
    A striking paradox is that genes with conserved protein sequence, function and expression pattern over deep time often exhibit extremely divergentcis-regulatory sequences. It remains unclear how such drasticcis-regulatory evolution across species allows preservation of gene function, and to what extent these differences influence howcis-regulatory variation arising within species impacts phenotypic change. Here, we investigated these questions using a plant stem cell regulator conserved in expression pattern and function over ~125 million years. Usingin-vivogenome editing in two distantly related models,Arabidopsis thaliana(Arabidopsis) andSolanum lycopersicum(tomato), we generated over 70 deletion alleles in the upstream and downstream regions of the stem cell repressor geneCLAVATA3(CLV3) and compared their individual and combined effects on a shared phenotype, the number of carpels that make fruits. We found that sequences upstream of tomatoCLV3are highly sensitive to even small perturbations compared to its downstream region. In contrast, ArabidopsisCLV3function is tolerant to severe disruptions both upstream and downstream of the coding sequence. Combining upstream and downstream deletions also revealed a different regulatory outcome. Whereas phenotypic enhancement from adding downstream mutations was predominantly weak and additive in tomato, mutating both regions of ArabidopsisCLV3caused substantial and synergistic effects, demonstrating distinct distribution and redundancy of functionalcis-regulatory sequences. Our results demonstrate remarkable malleability incis-regulatory structural organization of a deeply conserved plant stem cell regulator and suggest that major reconfiguration ofcis-regulatory sequence space is a common yet cryptic evolutionary force altering genotype-to-phenotype relationships from regulatory variation in conserved genes. Finally, our findings underscore the need for lineage-specific dissection of the spatial architecture ofcis-regulation to effectively engineer trait variation from conserved productivity genes in crops. 
    more » « less
  3. Lasky, Jesse R. (Ed.)
    Gene expression can be influenced by genetic variants that are closely linked to the expressed gene (cis eQTLs) and variants in other parts of the genome (trans eQTLs). We created a multiparental mapping population by sampling genotypes from a single natural population ofMimulus guttatusand scored gene expression in the leaves of 1,588 plants. We find that nearly every measured gene exhibits cis regulatory variation (91% have FDR < 0.05). cis eQTLs are usually allelic series with three or more functionally distinct alleles. The cis locus explains about two thirds of the standing genetic variance (on average) but varies among genes and tends to be greatest when there is high indel variation in the upstream regulatory region and high nucleotide diversity in the coding sequence. Despite mapping over 10,000 trans eQTL / affected gene pairs, most of the genetic variance generated by trans acting loci remains unexplained. This implies a large reservoir of trans acting genes with subtle or diffuse effects. Mapped trans eQTLs show lower allelic diversity but much higher genetic dominance than cis eQTLs. Several analyses also indicate that trans eQTLs make a substantial contribution to the genetic correlations in expression among different genes. They may thus be essential determinants of “gene expression modules,” which has important implications for the evolution of gene expression and how it is studied by geneticists. 
    more » « less
  4. Heritable variation in gene expression is common within and among species and contributes to phenotypic diversity. Mutations affecting eithercis- ortrans-regulatory sequences controlling gene expression give rise to variation in gene expression, and natural selection acting on this variation causes some regulatory variants to persist in a population for longer than others. To understand how mutation and selection interact to produce the patterns of regulatory variation we see within and among species, my colleagues and I have been systematically determining the effects of new mutations on expression of theTDH3gene inSaccharomyces cerevisiaeand comparing them to the effects of polymorphisms segregating within this species. We have also investigated the molecular mechanisms by which regulatory variants act. Over the past decade, this work has revealed properties ofcis- andtrans-regulatory mutations including their relative frequency, effects, dominance, pleiotropy and fitness consequences. Comparing these mutational effects to the effects of polymorphisms in natural populations, we have inferred selection acting on expression level, expression noise and phenotypic plasticity. Here, I summarize this body of work and synthesize its findings to make inferences not readily discernible from the individual studies alone. This article is part of the theme issue ‘Interdisciplinary approaches to predicting evolutionary biology’. 
    more » « less
  5. Gossmann, Toni (Ed.)
    Abstract Understanding and predicting the relationships between genotype and phenotype is often challenging, largely due to the complex nature of eukaryotic gene regulation. A step towards this goal is to map how phenotypic diversity evolves through genomic changes that modify gene regulatory interactions. Using the Prairie Rattlesnake (Crotalus viridis) and related species, we integrate mRNA-seq, proteomic, ATAC-seq and whole genome resequencing data to understand how specific evolutionary modifications to gene regulatory network components produce differences in venom gene expression. Through comparisons within and between species, we find a remarkably high degree of gene expression and regulatory network variation across even a shallow level of evolutionary divergence. We use these data to test hypotheses about the roles of specific trans-factors and cis-regulatory elements, how these roles may vary across venom genes and gene families, and how variation in regulatory systems drive diversity in venom phenotypes. Our results illustrate that differences in chromatin and genotype at regulatory elements play major roles in modulating expression. However, we also find that enhancer deletions, differences in transcription-factor expression, and variation in activity of the insulator protein CTCF also likely impact venom phenotypes. Our findings provide insight into the diversity and gene-specificity of gene regulatory features and highlight the value of comparative studies to link gene regulatory network variation to phenotypic variation. 
    more » « less