skip to main content


Title: A systematic study of motif pairs that may facilitate enhancer–promoter interactions
Pairs of interacting transcription factors (TFs) have previously been shown to bind to enhancers and promoters and contribute to their physical interactions. However, to date, we have limited knowledge about such TF pairs. To fill this void, we systematically studied the co-occurrence of TF-binding motifs in interacting enhancer–promoter (EP) pairs in seven human cell lines. We discovered 423 motif pairs that significantly co-occur in enhancers and promoters of interacting EP pairs. We demonstrated that these motif pairs are biologically meaningful and significantly enriched with motif pairs of known interacting TF pairs. We also showed that the identified motif pairs facilitated the discovery of the interacting EP pairs. The developed pipeline, EPmotifPair, together with the predicted motifs and motif pairs, is available at https://doi.org/10.6084/m9.figshare.14192000. Our study provides a comprehensive list of motif pairs that may contribute to EP physical interactions, which facilitate generating meaningful hypotheses for experimental validation.  more » « less
Award ID(s):
2015838
NSF-PAR ID:
10342718
Author(s) / Creator(s):
Date Published:
Journal Name:
Journal of integrative bioinformatics
Volume:
19
Issue:
1
ISSN:
1613-4516
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    A large number of distal enhancers and proximal promoters form enhancer–promoter interactions to regulate target genes in the human genome. Although recent high-throughput genome-wide mapping approaches have allowed us to more comprehensively recognize potential enhancer–promoter interactions, it is still largely unknown whether sequence-based features alone are sufficient to predict such interactions.

    Results

    Here, we develop a new computational method (named PEP) to predict enhancer–promoter interactions based on sequence-based features only, when the locations of putative enhancers and promoters in a particular cell type are given. The two modules in PEP (PEP-Motif and PEP-Word) use different but complementary feature extraction strategies to exploit sequence-based information. The results across six different cell types demonstrate that our method is effective in predicting enhancer–promoter interactions as compared to the state-of-the-art methods that use functional genomic signals. Our work demonstrates that sequence-based features alone can reliably predict enhancer–promoter interactions genome-wide, which could potentially facilitate the discovery of important sequence determinants for long-range gene regulation.

    Availability and Implementation

    The source code of PEP is available at: https://github.com/ma-compbio/PEP.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Summary

    Self‐transcribing active regulatory region sequencing (STARR‐seq) is widely used to identify enhancers at the whole‐genome level. However, whether STARR‐seq works as efficiently in plants as in animal systems remains unclear. Here, we determined that the traditional STARR‐seq method can be directly applied to rice (Oryza sativa) protoplasts to identify enhancers, though with limited efficiency. Intriguingly, we identified not only enhancers but also constitutive promoters with this technique. To increase the performance of STARR‐seq in plants, we optimized two procedures. We coupled fluorescence activating cell sorting (FACS) with STARR‐seq to alleviate the effect of background noise, and we minimized PCR cycles and retained duplicates during prediction, which significantly increased the positive rate for activating regulatory elements (AREs). Using this method, we determined that AREs are associated with AT‐rich regions and are enriched for a motif that the AP2/ERF family can recognize. Based on GC content preferences, AREs are clustered into two groups corresponding to promoters and enhancers. Either AT‐ or GC‐rich regions within AREs could boost transcription. Additionally, disruption of AREs resulted in abnormal expression of both proximal and distal genes, which suggests that STARR‐seq‐revealed elements function as enhancersin vivo. In summary, our work provides a promising method to identify AREs in plants.

     
    more » « less
  3. Abstract

    Understanding the contributions of transcription factor DNA binding sites to transcriptional enhancers is a significant challenge. We developed Quantitative enhancer-FACS-Seq for highly parallel quantification of enhancer activities from a genomically integrated reporter inDrosophila melanogasterembryos. We investigate the contributions of the DNA binding motifs of four poorly characterized TFs to the activities of twelve embryonic mesodermal enhancers. We measure quantitative changes in enhancer activity and discover a range of epistatic interactions among the motifs, both synergistic and alleviating. We find that understanding the regulatory consequences of TF binding motifs requires that they be investigated in combination across enhancer contexts.

     
    more » « less
  4. Abstract Motivation

    Given a protein of unknown function, fast identification of similar protein structures from the Protein Data Bank (PDB) is a critical step for inferring its biological function. Such structural neighbors can provide evolutionary insights into protein conformation, interfaces and binding sites that are not detectable from sequence similarity. However, the computational cost of performing pairwise structural alignment against all structures in PDB is prohibitively expensive. Alignment-free approaches have been introduced to enable fast but coarse comparisons by representing each protein as a vector of structure features or fingerprints and only computing similarity between vectors. As a notable example, FragBag represents each protein by a ‘bag of fragments’, which is a vector of frequencies of contiguous short backbone fragments from a predetermined library. Despite being efficient, the accuracy of FragBag is unsatisfactory because its backbone fragment library may not be optimally constructed and long-range interacting patterns are omitted.

    Results

    Here we present a new approach to learning effective structural motif presentations using deep learning. We develop DeepFold, a deep convolutional neural network model to extract structural motif features of a protein structure. We demonstrate that DeepFold substantially outperforms FragBag on protein structural search on a non-redundant protein structure database and a set of newly released structures. Remarkably, DeepFold not only extracts meaningful backbone segments but also finds important long-range interacting motifs for structural comparison. We expect that DeepFold will provide new insights into the evolution and hierarchical organization of protein structural motifs.

    Availability and implementation

    https://github.com/largelymfs/DeepFold

     
    more » « less
  5. Transcription factor (TF)–promoter pairs have been repurposed from native hosts to provide tools to measure intracellular biochemical production titer and dynamically control gene expression. Most often, native TF–promoter systems require rigorous screening to obtain desirable characteristics optimized for biotechnological applications. High-throughput techniques may provide a rational and less labor-intensive strategy to engineer user-defined TF–promoter pairs using fluorescence-activated cell sorting and deep sequencing methods (sort-seq). Based on the designed promoter library’s distribution characteristics, we elucidate sequence–function interactions between the TF and DNA. In this work, we use the sort-seq method to study the sequence–function relationship of a σ 54 -dependent, butanol-responsive TF–promoter pair, BmoR-P BMO derived from Thauera butanivorans , at the nucleotide level to improve biosensor characteristics, specifically an improved dynamic range. Activities of promoters from a mutagenized P BMO library were sorted based on gfp expression and subsequently deep sequenced to correlate site-specific sequences with changes in dynamic range. We identified site-specific mutations that increase the sensor output. Double mutant and a single mutant, CA(129,130)TC and G(205)A, in P BMO promoter increased dynamic ranges of 4-fold and 1.65-fold compared with the native system, respectively. In addition, sort-seq identified essential sites required for the proper function of the σ 54 -dependent promoter biosensor in the context of the host. This work can enable high-throughput screening methods for strain development. 
    more » « less