Abstract Phenotypic variation in organism-level traits has been studied inCaenorhabditis eleganswild strains, but the impacts of differences in gene expression and the underlying regulatory mechanisms are largely unknown. Here, we use natural variation in gene expression to connect genetic variants to differences in organismal-level traits, including drug and toxicant responses. We perform transcriptomic analyses on 207 genetically distinctC. eleganswild strains to study natural regulatory variation of gene expression. Using this massive dataset, we perform genome-wide association mappings to investigate the genetic basis underlying gene expression variation and reveal complex genetic architectures. We find a large collection of hotspots enriched for expression quantitative trait loci across the genome. We further use mediation analysis to understand how gene expression variation could underlie organism-level phenotypic variation for a variety of complex traits. These results reveal the natural diversity in gene expression and possible regulatory mechanisms in this keystone model organism, highlighting the promise of using gene expression variation to understand how phenotypic diversity is generated. 
                        more » 
                        « less   
                    
                            
                            Personal transcriptome variation is poorly explained by current genomic deep learning models
                        
                    
    
            Abstract Genomic deep learning models can predict genome-wide epigenetic features and gene expression levels directly from DNA sequence. While current models perform well at predicting gene expression levels across genes in different cell types from the reference genome, their ability to explain expression variation between individuals due tocis-regulatory genetic variants remains largely unexplored. Here, we evaluate four state-of-the-art models on paired personal genome and transcriptome data and find limited performance when explaining variation in expression across individuals. In addition, models often fail to predict the correct direction of effect ofcis-regulatory genetic variation on expression. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2238125
- PAR ID:
- 10532413
- Publisher / Repository:
- Nature Genetics
- Date Published:
- Journal Name:
- Nature Genetics
- Volume:
- 55
- Issue:
- 12
- ISSN:
- 1061-4036
- Page Range / eLocation ID:
- 2056 to 2059
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Lasky, Jesse R. (Ed.)Gene expression can be influenced by genetic variants that are closely linked to the expressed gene (cis eQTLs) and variants in other parts of the genome (trans eQTLs). We created a multiparental mapping population by sampling genotypes from a single natural population ofMimulus guttatusand scored gene expression in the leaves of 1,588 plants. We find that nearly every measured gene exhibits cis regulatory variation (91% have FDR < 0.05). cis eQTLs are usually allelic series with three or more functionally distinct alleles. The cis locus explains about two thirds of the standing genetic variance (on average) but varies among genes and tends to be greatest when there is high indel variation in the upstream regulatory region and high nucleotide diversity in the coding sequence. Despite mapping over 10,000 trans eQTL / affected gene pairs, most of the genetic variance generated by trans acting loci remains unexplained. This implies a large reservoir of trans acting genes with subtle or diffuse effects. Mapped trans eQTLs show lower allelic diversity but much higher genetic dominance than cis eQTLs. Several analyses also indicate that trans eQTLs make a substantial contribution to the genetic correlations in expression among different genes. They may thus be essential determinants of “gene expression modules,” which has important implications for the evolution of gene expression and how it is studied by geneticists.more » « less
- 
            Massively parallel reporter assays (MPRAs) are powerful tools for quantifying the impacts of sequence variation on gene expression. Reading out molecular phenotypes with sequencing enables interrogating the impact of sequence variation beyond genome scale. Machine learning models integrate and codify information learned from MPRAs and enable generalization by predicting sequences outside the training data set. Models can provide a quantitative understanding ofcis-regulatory codes controlling gene expression, enable variant stratification, and guide the design of synthetic regulatory elements for applications from synthetic biology to mRNA and gene therapy. This review focuses oncis-regulatory MPRAs, particularly those that interrogate cotranscriptional and post-transcriptional processes: alternative splicing, cleavage and polyadenylation, translation, and mRNA decay.more » « less
- 
            Hake, Sarah (Ed.)A striking paradox is that genes with conserved protein sequence, function and expression pattern over deep time often exhibit extremely divergentcis-regulatory sequences. It remains unclear how such drasticcis-regulatory evolution across species allows preservation of gene function, and to what extent these differences influence howcis-regulatory variation arising within species impacts phenotypic change. Here, we investigated these questions using a plant stem cell regulator conserved in expression pattern and function over ~125 million years. Usingin-vivogenome editing in two distantly related models,Arabidopsis thaliana(Arabidopsis) andSolanum lycopersicum(tomato), we generated over 70 deletion alleles in the upstream and downstream regions of the stem cell repressor geneCLAVATA3(CLV3) and compared their individual and combined effects on a shared phenotype, the number of carpels that make fruits. We found that sequences upstream of tomatoCLV3are highly sensitive to even small perturbations compared to its downstream region. In contrast, ArabidopsisCLV3function is tolerant to severe disruptions both upstream and downstream of the coding sequence. Combining upstream and downstream deletions also revealed a different regulatory outcome. Whereas phenotypic enhancement from adding downstream mutations was predominantly weak and additive in tomato, mutating both regions of ArabidopsisCLV3caused substantial and synergistic effects, demonstrating distinct distribution and redundancy of functionalcis-regulatory sequences. Our results demonstrate remarkable malleability incis-regulatory structural organization of a deeply conserved plant stem cell regulator and suggest that major reconfiguration ofcis-regulatory sequence space is a common yet cryptic evolutionary force altering genotype-to-phenotype relationships from regulatory variation in conserved genes. Finally, our findings underscore the need for lineage-specific dissection of the spatial architecture ofcis-regulation to effectively engineer trait variation from conserved productivity genes in crops.more » « less
- 
            Abstract Allele-specific expression quantification from RNA-seq reads provides opportunities to study the control of gene regulatory networks bycis-acting andtrans-acting genetic variants. Many existing methods performed a single-gene and single-SNP association analysis to identify expression quantitative trait loci (eQTLs), and placed the eQTLs against known gene networks for functional interpretation. Instead, we view eQTL data as a capture of the effects of perturbation of gene regulatory system by a large number of genetic variants and reconstruct a gene network perturbed by eQTLs. We introduce a statistical framework called CiTruss for simultaneously learning a gene network andcis-acting andtrans-acting eQTLs that perturb this network, given population allele-specific expression and SNP data. CiTruss uses a multi-level conditional Gaussian graphical model to modeltrans-acting eQTLs perturbing the expression of both alleles in gene network at the top level andcis-acting eQTLs perturbing the expression of each allele at the bottom level. We derive a transformation of this model that allows efficient learning for large-scale human data. Our analysis of the GTEx and LG×SM advanced intercross line mouse data for multiple tissue types with CiTruss provides new insights into genetics of gene regulation. CiTruss revealed that gene networks consist of local subnetworks over proximally located genes and global subnetworks over genes scattered across genome, and that several aspects of gene regulation by eQTLs such as the impact of genetic diversity, pleiotropy, tissue-specific gene regulation, and local and long-range linkage disequilibrium among eQTLs can be explained through these local and global subnetworks.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    