skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: PCRMS: a database of predicted cis-regulatory modules and constituent transcription factor binding sites in genomes
Abstract More accurate and more complete predictions of cis-regulatory modules (CRMs) and constituent transcription factor (TF) binding sites (TFBSs) in genomes can facilitate characterizing functions of regulatory sequences. Here, we developed a database predicted cis-regulatory modules (PCRMS) (https://cci-bioinfo.uncc.edu) that stores highly accurate and unprecedentedly complete maps of predicted CRMs and TFBSs in the human and mouse genomes. The web interface allows the user to browse CRMs and TFBSs in an organism, find the closest CRMs to a gene, search CRMs around a gene and find all TFBSs of a TF. PCRMS can be a useful resource for the research community to characterize regulatory genomes. Database URL: https://cci-bioinfo.uncc.edu/  more » « less
Award ID(s):
1661332
PAR ID:
10378092
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Database
Volume:
2022
ISSN:
1758-0463
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract cis-regulatory modules(CRMs) formed by clusters of transcription factor (TF) binding sites (TFBSs) are as important as coding sequences in specifying phenotypes of humans. It is essential to categorize all CRMs and constituent TFBSs in the genome. In contrast to most existing methods that predict CRMs in specific cell types using epigenetic marks, we predict a largely cell type agonistic but more comprehensive map of CRMs and constituent TFBSs in the gnome by integrating all available TF ChIP-seq datasets. Our method is able to partition 77.47% of genome regions covered by available 6092 datasets into a CRM candidate (CRMC) set (56.84%) and a non-CRMC set (43.16%). Intriguingly, the predicted CRMCs are under strong evolutionary constraints, while the non-CRMCs are largely selectively neutral, strongly suggesting that the CRMCs are likely cis-regulatory, while the non-CRMCs are not. Our predicted CRMs are under stronger evolutionary constraints than three state-of-the-art predictions (GeneHancer, EnhancerAtlas and ENCODE phase 3) and substantially outperform them for recalling VISTA enhancers and non-coding ClinVar variants. We estimated that the human genome might encode about 1.47M CRMs and 68M TFBSs, comprising about 55% and 22% of the genome, respectively; for both of which, we predicted 80%. Therefore, the cis-regulatory genome appears to be more prevalent than originally thought. 
    more » « less
  2. Abstract BackgroundMouse is probably the most important model organism to study mammal biology and human diseases. A better understanding of the mouse genome will help understand the human genome, biology and diseases. However, despite the recent progress, the characterization of the regulatory sequences in the mouse genome is still far from complete, limiting its use to understand the regulatory sequences in the human genome. ResultsHere, by integrating binding peaks in ~ 9,000 transcription factor (TF) ChIP-seq datasets that cover 79.9% of the mouse mappable genome using an efficient pipeline, we were able to partition these binding peak-covered genome regions into acis-regulatory module (CRM) candidate (CRMC) set and a non-CRMC set. The CRMCs contain 912,197 putative CRMs and 38,554,729 TF binding sites (TFBSs) islands, covering 55.5% and 24.4% of the mappable genome, respectively. The CRMCs tend to be under strong evolutionary constraints, indicating that they are likelycis-regulatory; while the non-CRMCs are largely selectively neutral, indicating that they are unlikelycis-regulatory. Based on evolutionary profiles of the genome positions, we further estimated that 63.8% and 27.4% of the mouse genome might code for CRMs and TFBSs, respectively. ConclusionsValidation using experimental data suggests that at least most of the CRMCs are authentic. Thus, this unprecedentedly comprehensive map of CRMs and TFBSs can be a good resource to guide experimental studies of regulatory genomes in mice and humans. 
    more » « less
  3. Abstract The identification and characterization of cis-regulatory DNA sequences and how they function to coordinate responses to developmental and environmental cues is of paramount importance to plant biology. Key to these regulatory processes are cis-regulatory modules (CRMs), which include enhancers and silencers. Despite the extraordinary advances in high-quality sequence assemblies and genome annotations, the identification and understanding of CRMs, and how they regulate gene expression, lag significantly behind. This is especially true for their distinguishing characteristics and activity states. Here, we review the current knowledge on CRMs and breakthrough technologies enabling identification, characterization, and validation of CRMs; we compare the genomic distributions of CRMs with respect to their target genes between different plant species, and discuss the role of transposable elements harboring CRMs in the evolution of gene expression. This is an exciting time to study cis-regulomes in plants; however, significant existing challenges need to be overcome to fully understand and appreciate the role of CRMs in plant biology and in crop improvement. 
    more » « less
  4. Abstract It has long been known that exons can serve ascis‐regulatory sequences, such as enhancers. However, the prevalence of such dual‐use of exons and how they evolve remain elusive. Based on our recently predicted, highly accurate large sets ofcis‐regulatory module candidates (CRMCs) and non‐CRMCs in the human genome, we find that exonic transcription factor binding sites (TFBSs) occupy at least a third of the total exon lengths, and 96.7% of genes have exonic TFBSs. Both A/T and C/G in exonic TFBSs are more likely under evolutionary constraints than those in non‐CRMC exons. Exonic TFBSs in codons tend to encode loops rather than more critical helices and strands in protein structures, while exonic TFBSs in untranslated regions (UTRs) tend to avoid positions where known UTR‐related functions are located. Moreover, active exonic TFBSs tend to be in close physical proximity to distal promoters whose genes have elevated transcription levels. These results suggest that exonic TFBSs might be more prevalent than originally thought and likely in dual‐use. We proposed a parsimonious model that well explains the observed evolutionary behaviors of exonic TFBS as well as how a stretch of codons evolve into a TFBS. Key pointsThere are more exonic regulatory sequences in the human genome than originally thought.Exonic transcription factor binding sites are more likely under negative selection or positive selection than counterpart nonregulatory sequences.Exonic transcription factor binding sites tend to be located in genome sequences that encode less critical loops in protein structures, or in less critical parts in 5′ and 3′ untranslated regions. 
    more » « less
  5. null (Ed.)
    An ever-growing number of insect genomes is being sequenced across the evolutionary spectrum. Comprehensive annotation of not only genes but also regulatory regions is critical for reaping the full benefits of this sequencing. Driven by developments in sequencing technologies and in both empirical and computational discovery strategies, the past few decades have witnessed dramatic progress in our ability to identify cis-regulatory modules (CRMs), sequences such as enhancers that play a major role in regulating transcription. Nevertheless, providing a timely and comprehensive regulatory annotation of newly sequenced insect genomes is an ongoing challenge. We review here the methods being used to identify CRMs in both model and non-model insect species, and focus on two tools that we have developed, REDfly and SCRMshaw. These resources can be paired together in a powerful combination to facilitate insect regulatory annotation over a broad range of species, with an accuracy equal to or better than that of other state-of-the-art methods. 
    more » « less