Almost all regulation of gene expression in eukaryotic genomes is mediated by the action of distant non-coding transcriptional enhancers upon proximal gene promoters. Enhancer locations cannot be accurately predicted bioinformatically because of the absence of a defined sequence code, and thus functional assays are required for their direct detection. Here we used a massively parallel reporter assay, Self-Transcribing Active Regulatory Region sequencing (STARR-seq), to generate the first comprehensive genome-wide map of enhancers in Anopheles coluzzii , a major African malaria vector in the Gambiae species complex. The screen was carried out by transfecting reporter libraries created from the genomic DNA of 60 wild A. coluzzii from Burkina Faso into A. coluzzii 4a3A cells, in order to functionally query enhancer activity of the natural population within the homologous cellular context. We report a catalog of 3,288 active genomic enhancers that were significant across three biological replicates, 74% of them located in intergenic and intronic regions. The STARR-seq enhancer screen is chromatin-free and thus detects inherent activity of a comprehensive catalog of enhancers that may be restricted in vivo to specific cell types or developmental stages. Testing of a validation panel of enhancer candidates using manual luciferase assays confirmed enhancer function in 26 of 28 (93%) of the candidates over a wide dynamic range of activity from two to at least 16-fold activity above baseline. The enhancers occupy only 0.7% of the genome, and display distinct composition features. The enhancer compartment is significantly enriched for 15 transcription factor binding site signatures, and displays divergence for specific dinucleotide repeats, as compared to matched non-enhancer genomic controls. The genome-wide catalog of A. coluzzii enhancers is publicly available in a simple searchable graphic format. This enhancer catalogue will be valuable in linking genetic and phenotypic variation, in identifying regulatory elements that could be employed in vector manipulation, and in better targeting of chromosome editing to minimize extraneous regulation influences on the introduced sequences. Importance: Understanding the role of the non-coding regulatory genome in complex disease phenotypes is essential, but even in well-characterized model organisms, identification of regulatory regions within the vast non-coding genome remains a challenge. We used a large-scale assay to generate a genome wide map of transcriptional enhancers. Such a catalogue for the important malaria vector, Anopheles coluzzii , will be an important research tool as the role of non-coding regulatory variation in differential susceptibility to malaria infection is explored and as a public resource for research on this important insect vector of disease.
more »
« less
The Drosophila MLR COMPASS complex is essential for programming cis-regulatory information and maintaining epigenetic memory during development
Abstract The MLR COMPASS complex monomethylates H3K4 that serves to epigenetically mark transcriptional enhancers to drive proper gene expression during animal development. Chromatin enrichment analyses of the Drosophila MLR complex reveals dynamic association with promoters and enhancers in embryos with late stage enrichments biased toward both active and poised enhancers. RNAi depletion of the Cmi (also known as Lpt) subunit that contains the chromatin binding PHD finger domains attenuates enhancer functions, but unexpectedly results in inappropriate enhancer activation during stages when hormone responsive enhancers are poised, revealing critical epigenetic roles involved in both the activation and repression of enhancers depending on developmental context. Cmi is necessary for robust H3K4 monomethylation and H3K27 acetylation that mark active enhancers, but not for the chromatin binding of Trr, the MLR methyltransferase. Our data reveal two likely major regulatory modes of MLR function, contributions to enhancer commissioning in early embryogenesis and bookmarking enhancers to enable rapid transcriptional re-activation at subsequent developmental stages.
more »
« less
- Award ID(s):
- 1716431
- PAR ID:
- 10192526
- Date Published:
- Journal Name:
- Nucleic Acids Research
- Volume:
- 48
- Issue:
- 7
- ISSN:
- 0305-1048
- Page Range / eLocation ID:
- 3476 to 3495
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We used capped analysis of gene expression with sequencing (CAGE-seq) to profile eRNA expression and enhancer activity during embryogenesis of a model echinoderm: the sea urchin, Strongylocentrotus purpuratus . We identified more than 18,000 enhancers that were active in mature oocytes and developing embryos and documented a burst of enhancer activation during cleavage and early blastula stages. We found that a large fraction (73.8%) of all enhancers active during the first 48 h of embryogenesis were hyperaccessible no later than the 128-cell stage and possibly even earlier. Most enhancers were located near gene bodies, and temporal patterns of eRNA expression tended to parallel those of nearby genes. Furthermore, enhancers near lineage-specific genes contained signatures of inputs from developmental gene regulatory networks deployed in those lineages. A large fraction (60%) of sea urchin enhancers previously shown to be active in transgenic reporter assays was associated with eRNA expression. Moreover, a large fraction (50%) of a representative subset of enhancers identified by eRNA profiling drove tissue-specific gene expression in isolation when tested by reporter assays. Our findings provide an atlas of developmental enhancers in a model sea urchin and support the utility of eRNA profiling as a tool for enhancer discovery and regulatory biology. The data generated in this study are available at Echinobase, the public database of information related to echinoderm genomics.more » « less
-
A suboptimal OCT4-SOX2 binding site facilitates the naïve-state specific function of a Klf4 enhancerVall-llosera_Camps, Miquel (Ed.)Enhancers have critical functions in the precise, spatiotemporal control of transcription during development. It is thought that enhancer grammar, or the characteristics and arrangements of transcription factor binding sites, underlie the specific functions of developmental enhancers. In this study, we sought to identify grammatical constraints that direct enhancer activity in the naïve state of pluripotency, focusing on the enhancers for the naïve-state specific gene,Klf4. Using a combination of biochemical tests, reporter assays, and endogenous mutations in mouse embryonic stem cells, we have studied the binding sites for the transcription factors OCT4 and SOX2. We have found that the threeKlf4enhancers contain suboptimal OCT4-SOX2 composite binding sites. Substitution with a high-affinity OCT4-SOX2 binding site inKlf4enhancer E2 rescued enhancer function andKlf4expression upon loss of the ESRRB and STAT3 binding sites. We also observed that the low-affinity of the OCT4-SOX2 binding site is crucial to drive the naïve-state specific activities ofKlf4enhancer E2. Altogether, our work suggests that the affinity of OCT4-SOX2 binding sites could facilitate enhancer functions in specific states of pluripotency.more » « less
-
INTRODUCTION Diverse phenotypes, including large brains relative to body size, group living, and vocal learning ability, have evolved multiple times throughout mammalian history. These shared phenotypes may have arisen repeatedly by means of common mechanisms discernible through genome comparisons. RATIONALE Protein-coding sequence differences have failed to fully explain the evolution of multiple mammalian phenotypes. This suggests that these phenotypes have evolved at least in part through changes in gene expression, meaning that their differences across species may be caused by differences in genome sequence at enhancer regions that control gene expression in specific tissues and cell types. Yet the enhancers involved in phenotype evolution are largely unknown. Sequence conservation–based approaches for identifying such enhancers are limited because enhancer activity can be conserved even when the individual nucleotides within the sequence are poorly conserved. This is due to an overwhelming number of cases where nucleotides turn over at a high rate, but a similar combination of transcription factor binding sites and other sequence features can be maintained across millions of years of evolution, allowing the function of the enhancer to be conserved in a particular cell type or tissue. Experimentally measuring the function of orthologous enhancers across dozens of species is currently infeasible, but new machine learning methods make it possible to make reliable sequence-based predictions of enhancer function across species in specific tissues and cell types. RESULTS To overcome the limits of studying individual nucleotides, we developed the Tissue-Aware Conservation Inference Toolkit (TACIT). Rather than measuring the extent to which individual nucleotides are conserved across a region, TACIT uses machine learning to test whether the function of a given part of the genome is likely to be conserved. More specifically, convolutional neural networks learn the tissue- or cell type–specific regulatory code connecting genome sequence to enhancer activity using candidate enhancers identified from only a few species. This approach allows us to accurately associate differences between species in tissue or cell type–specific enhancer activity with genome sequence differences at enhancer orthologs. We then connect these predictions of enhancer function to phenotypes across hundreds of mammals in a way that accounts for species’ phylogenetic relatedness. We applied TACIT to identify candidate enhancers from motor cortex and parvalbumin neuron open chromatin data that are associated with brain size relative to body size, solitary living, and vocal learning across 222 mammals. Our results include the identification of multiple candidate enhancers associated with brain size relative to body size, several of which are located in linear or three-dimensional proximity to genes whose protein-coding mutations have been implicated in microcephaly or macrocephaly in humans. We also identified candidate enhancers associated with the evolution of solitary living near a gene implicated in separation anxiety and other enhancers associated with the evolution of vocal learning ability. We obtained distinct results for bulk motor cortex and parvalbumin neurons, demonstrating the value in applying TACIT to both bulk tissue and specific minority cell type populations. To facilitate future analyses of our results and applications of TACIT, we released predicted enhancer activity of >400,000 candidate enhancers in each of 222 mammals and their associations with the phenotypes we investigated. CONCLUSION TACIT leverages predicted enhancer activity conservation rather than nucleotide-level conservation to connect genetic sequence differences between species to phenotypes across large numbers of mammals. TACIT can be applied to any phenotype with enhancer activity data available from at least a few species in a relevant tissue or cell type and a whole-genome alignment available across dozens of species with substantial phenotypic variation. Although we developed TACIT for transcriptional enhancers, it could also be applied to genomic regions involved in other components of gene regulation, such as promoters and splicing enhancers and silencers. As the number of sequenced genomes grows, machine learning approaches such as TACIT have the potential to help make sense of how conservation of, or changes in, subtle genome patterns can help explain phenotype evolution. Tissue-Aware Conservation Inference Toolkit (TACIT) associates genetic differences between species with phenotypes. TACIT works by generating open chromatin data from a few species in a tissue related to a phenotype, using the sequences underlying open and closed chromatin regions to train a machine learning model for predicting tissue-specific open chromatin and associating open chromatin predictions across dozens of mammals with the phenotype. [Species silhouettes are from PhyloPic]more » « less
-
Conserved NF-κB signaling pathways shape immune responses in animals. In mammals, NF-κB activation patterns and downstream transcription vary with stimulus, cell type, and stochastic differences among identically treated cells. Whether animals without adaptive immunity exhibit similar heterogeneity or rely on distinct immune strategies remains unknown. We engineered Drosophila melanogaster S2* reporter cells as an immune-responsive model to monitor the dynamics of an NF-κB transcription factor, Relish, and downstream transcription in single, living cells. Following immune stimulation, Relish exhibits diverse nuclear localization dynamics that fall into distinct categories, with both the fraction of responsive cells and their activation speed rising with stimulus dose. Pre-stimulus features, including Relish nuclear fraction, predict a cell's responsiveness to stimulation. Simultaneous measurement of Relish and downstream transcription revealed that the probability of transcriptional bursts from immune-responsive enhancers correlates with Relish nuclear fraction. The number of NF-κB binding sites tunes transcriptional activity among immune enhancers. Our study uncovers heterogeneity in NF-κB activation and target gene expression within Drosophila, illustrating how dynamic NF-κB behavior and enhancer architecture tune gene regulation.more » « less
An official website of the United States government

