skip to main content


Title: Enhancer Pleiotropy, Gene Expression, and the Architecture of Human Enhancer–Gene Interactions
Abstract Enhancers are often studied as noncoding regulatory elements that modulate the precise spatiotemporal expression of genes in a highly tissue-specific manner. This paradigm has been challenged by recent evidence of individual enhancers acting in multiple tissues or developmental contexts. However, the frequency of these enhancers with high degrees of “pleiotropy” out of all putative enhancers is not well understood. Consequently, it is unclear how the variation of enhancer pleiotropy corresponds to the variation in expression breadth of target genes. Here, we use multi-tissue chromatin maps from diverse human tissues to investigate the enhancer–gene interaction architecture while accounting for 1) the distribution of enhancer pleiotropy, 2) the variations of regulatory links from enhancers to target genes, and 3) the expression breadth of target genes. We show that most enhancers are tissue-specific and that highly pleiotropy enhancers account for <1% of all putative regulatory sequences in the human genome. Notably, several genomic features are indicative of increasing enhancer pleiotropy, including longer sequence length, greater number of links to genes, increasing abundance and diversity of encoded transcription factor motifs, and stronger evolutionary conservation. Intriguingly, the number of enhancers per gene remains remarkably consistent for all genes (∼14). However, enhancer pleiotropy does not directly translate to the expression breadth of target genes. We further present a series of Gaussian Mixture Models to represent this organization architecture. Consequently, we demonstrate that a modest trend of more pleiotropic enhancers targeting more broadly expressed genes can generate the observed diversity of expression breadths in the human genome.  more » « less
Award ID(s):
2021635
NSF-PAR ID:
10296164
Author(s) / Creator(s):
;
Editor(s):
Saitou, Naruya
Date Published:
Journal Name:
Molecular Biology and Evolution
Volume:
38
Issue:
9
ISSN:
1537-1719
Page Range / eLocation ID:
3898 to 3909
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. INTRODUCTION Diverse phenotypes, including large brains relative to body size, group living, and vocal learning ability, have evolved multiple times throughout mammalian history. These shared phenotypes may have arisen repeatedly by means of common mechanisms discernible through genome comparisons. RATIONALE Protein-coding sequence differences have failed to fully explain the evolution of multiple mammalian phenotypes. This suggests that these phenotypes have evolved at least in part through changes in gene expression, meaning that their differences across species may be caused by differences in genome sequence at enhancer regions that control gene expression in specific tissues and cell types. Yet the enhancers involved in phenotype evolution are largely unknown. Sequence conservation–based approaches for identifying such enhancers are limited because enhancer activity can be conserved even when the individual nucleotides within the sequence are poorly conserved. This is due to an overwhelming number of cases where nucleotides turn over at a high rate, but a similar combination of transcription factor binding sites and other sequence features can be maintained across millions of years of evolution, allowing the function of the enhancer to be conserved in a particular cell type or tissue. Experimentally measuring the function of orthologous enhancers across dozens of species is currently infeasible, but new machine learning methods make it possible to make reliable sequence-based predictions of enhancer function across species in specific tissues and cell types. RESULTS To overcome the limits of studying individual nucleotides, we developed the Tissue-Aware Conservation Inference Toolkit (TACIT). Rather than measuring the extent to which individual nucleotides are conserved across a region, TACIT uses machine learning to test whether the function of a given part of the genome is likely to be conserved. More specifically, convolutional neural networks learn the tissue- or cell type–specific regulatory code connecting genome sequence to enhancer activity using candidate enhancers identified from only a few species. This approach allows us to accurately associate differences between species in tissue or cell type–specific enhancer activity with genome sequence differences at enhancer orthologs. We then connect these predictions of enhancer function to phenotypes across hundreds of mammals in a way that accounts for species’ phylogenetic relatedness. We applied TACIT to identify candidate enhancers from motor cortex and parvalbumin neuron open chromatin data that are associated with brain size relative to body size, solitary living, and vocal learning across 222 mammals. Our results include the identification of multiple candidate enhancers associated with brain size relative to body size, several of which are located in linear or three-dimensional proximity to genes whose protein-coding mutations have been implicated in microcephaly or macrocephaly in humans. We also identified candidate enhancers associated with the evolution of solitary living near a gene implicated in separation anxiety and other enhancers associated with the evolution of vocal learning ability. We obtained distinct results for bulk motor cortex and parvalbumin neurons, demonstrating the value in applying TACIT to both bulk tissue and specific minority cell type populations. To facilitate future analyses of our results and applications of TACIT, we released predicted enhancer activity of >400,000 candidate enhancers in each of 222 mammals and their associations with the phenotypes we investigated. CONCLUSION TACIT leverages predicted enhancer activity conservation rather than nucleotide-level conservation to connect genetic sequence differences between species to phenotypes across large numbers of mammals. TACIT can be applied to any phenotype with enhancer activity data available from at least a few species in a relevant tissue or cell type and a whole-genome alignment available across dozens of species with substantial phenotypic variation. Although we developed TACIT for transcriptional enhancers, it could also be applied to genomic regions involved in other components of gene regulation, such as promoters and splicing enhancers and silencers. As the number of sequenced genomes grows, machine learning approaches such as TACIT have the potential to help make sense of how conservation of, or changes in, subtle genome patterns can help explain phenotype evolution. Tissue-Aware Conservation Inference Toolkit (TACIT) associates genetic differences between species with phenotypes. TACIT works by generating open chromatin data from a few species in a tissue related to a phenotype, using the sequences underlying open and closed chromatin regions to train a machine learning model for predicting tissue-specific open chromatin and associating open chromatin predictions across dozens of mammals with the phenotype. [Species silhouettes are from PhyloPic] 
    more » « less
  2. Beier, David R. (Ed.)
    Enhancers are context-specific regulators of expression that drive biological complexity and variation through the redeployment of conserved genes. An example of this is the enhancer-mediated control of Engrailed 1(EN1), a pleiotropic gene whose expression is required for the formation of mammalian eccrine sweat glands. We previously identified the En1 candidate enhancer (ECE) 18 cis-regulatory element that has been highly and repeatedly derived on the human lineage to potentiate ectodermal EN1 and induce our species’ uniquely high eccrine gland density. Intriguingly, ECE18 quantitative activity is negligible outside of primates and ECE18 is not required for En1 regulation and eccrine gland formation in mice, raising the possibility that distinct enhancers have evolved to modulate the same trait. Here we report the identification of the ECE20 enhancer and show it has conserved functionality in mouse and human developing skin ectoderm. Unlike ECE18, knock-out of ECE20 in mice reduces ectodermal En1 and eccrine gland number. Notably, we find ECE20, but not ECE18, is also required for En1 expression in the embryonic mouse brain, demonstrating that ECE20 is a pleiotropic En1 enhancer. Finally, that ECE18 deletion does not potentiate the eccrine phenotype of ECE20 knock-out mice supports the secondary incorporation of ECE18 into the regulation of this trait in primates. Our findings reveal that the mammalian En1 regulatory machinery diversified to incorporate both shared and lineage-restricted enhancers to regulate the same phenotype, and also have implications for understanding the forces that shape the robustness and evolvability of developmental traits. 
    more » « less
  3. We used capped analysis of gene expression with sequencing (CAGE-seq) to profile eRNA expression and enhancer activity during embryogenesis of a model echinoderm: the sea urchin, Strongylocentrotus purpuratus . We identified more than 18,000 enhancers that were active in mature oocytes and developing embryos and documented a burst of enhancer activation during cleavage and early blastula stages. We found that a large fraction (73.8%) of all enhancers active during the first 48 h of embryogenesis were hyperaccessible no later than the 128-cell stage and possibly even earlier. Most enhancers were located near gene bodies, and temporal patterns of eRNA expression tended to parallel those of nearby genes. Furthermore, enhancers near lineage-specific genes contained signatures of inputs from developmental gene regulatory networks deployed in those lineages. A large fraction (60%) of sea urchin enhancers previously shown to be active in transgenic reporter assays was associated with eRNA expression. Moreover, a large fraction (50%) of a representative subset of enhancers identified by eRNA profiling drove tissue-specific gene expression in isolation when tested by reporter assays. Our findings provide an atlas of developmental enhancers in a model sea urchin and support the utility of eRNA profiling as a tool for enhancer discovery and regulatory biology. The data generated in this study are available at Echinobase, the public database of information related to echinoderm genomics. 
    more » « less
  4. SUMMARY

    Gene expression is controlled and regulated by interactions betweencis‐regulatory DNA elements (CREs) and regulatory proteins. Enhancers are one of the most important classes of CREs in eukaryotes. Eukaryotic genes, especially those related to development or responses to environmental cues, are often regulated by multiple enhancers in different tissues and/or at different developmental stages. Remarkably, little is known about the molecular mechanisms by which enhancers regulate gene expression in plants. We identified a distal enhancer,CREβ, which regulates the expression ofAtDGK7, which encodes a diacylglycerol kinase in Arabidopsis. We developed a transgenic line containing the luciferase reporter gene (LUC) driven byCREβfused with a minimal cauliflower mosaic virus (CaMV) 35S promoter. TheCREβenhancer was shown to play a role in the response to osmotic pressure of theLUCreporter gene. A forward genetic screen pipeline based on the transgenic line was established to generate mutations associated with altered expression of theLUCreporter gene. We identified a suite of mutants with variableLUCexpression levels as well as different segregation patterns of the mutations in populations. We demonstrate that this pipeline will allow us to identifytrans‐regulatory factors associated withCREβfunction as well as those acting in the regulation of the endogenousAtDGK7gene.

     
    more » « less
  5. Abstract Background

    The vast majority of findings from human genome-wide association studies (GWAS) map to non-coding sequences, complicating their mechanistic interpretations and clinical translations. Non-coding sequences that are evolutionarily conserved and biochemically active could offer clues to the mechanisms underpinning GWAS discoveries. However, genetic effects of such sequences have not been systematically examined across a wide range of human tissues and traits, hampering progress to fully understand regulatory causes of human complex traits.

    Results

    Here we develop a simple yet effective strategy to identify functional elements exhibiting high levels of human-mouse sequence conservation and enhancer-like biochemical activity, which scales well to 313 epigenomic datasets across 106 human tissues and cell types. Combined with 468 GWAS of European (EUR) and East Asian (EAS) ancestries, these elements show tissue-specific enrichments of heritability and causal variants for many traits, which are significantly stronger than enrichments based on enhancers without sequence conservation. These elements also help prioritize candidate genes that are functionally relevant to body mass index (BMI) and schizophrenia but were not reported in previous GWAS with large sample sizes.

    Conclusions

    Our findings provide a comprehensive assessment of how sequence-conserved enhancer-like elements affect complex traits in diverse tissues and demonstrate a generalizable strategy of integrating evolutionary and biochemical data to elucidate human disease genetics.

     
    more » « less