skip to main content


Title: DNAcycP: a deep learning tool for DNA cyclizability prediction
Abstract

DNA mechanical properties play a critical role in every aspect of DNA-dependent biological processes. Recently a high throughput assay named loop-seq has been developed to quantify the intrinsic bendability of a massive number of DNA fragments simultaneously. Using the loop-seq data, we develop a software tool, DNAcycP, based on a deep-learning approach for intrinsic DNA cyclizability prediction. We demonstrate DNAcycP predicts intrinsic DNA cyclizability with high fidelity compared to the experimental data. Using an independent dataset from in vitro selection for enrichment of loopable sequences, we further verified the predicted cyclizability score, termed C-score, can well distinguish DNA fragments with different loopability. We applied DNAcycP to multiple species and compared the C-scores with available high-resolution chemical nucleosome maps. Our analyses showed that both yeast and mouse genomes share a conserved feature of high DNA bendability spanning nucleosome dyads. Additionally, we extended our analysis to transcription factor binding sites and surprisingly found that the cyclizability is substantially elevated at CTCF binding sites in the mouse genome. We further demonstrate this distinct mechanical property is conserved across mammalian species and is inherent to CTCF binding DNA motif.

 
more » « less
Award ID(s):
1764421
PAR ID:
10365313
Author(s) / Creator(s):
 ;  ;  ;  ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Nucleic Acids Research
Volume:
50
Issue:
6
ISSN:
0305-1048
Page Range / eLocation ID:
p. 3142-3154
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Chromatin looping is important for gene regulation, and studies of 3D chromatin structure across species and cell types have improved our understanding of the principles governing chromatin looping. However, 3D genome evolution and its relationship with natural selection remains largely unexplored. In mammals, the CTCF protein defines the boundaries of most chromatin loops, and variations in CTCF occupancy are associated with looping divergence. While many CTCF binding sites fall within transposable elements (TEs), their contribution to 3D chromatin structural evolution is unknown. Here we report the relative contributions of TE-driven CTCF binding site expansions to conserved and divergent chromatin looping in human and mouse. We demonstrate that TE-derived CTCF binding divergence may explain a large fraction of variable loops. These variable loops contribute significantly to corresponding gene expression variability across cells and species, possibly by refining sub-TAD-scale loop contacts responsible for cell-type-specific enhancer-promoter interactions.

     
    more » « less
  2. Inhomogeneous patterns of enhanced chromatin-chromatin contacts within 10-100 kb-sized regions of the genome are a generic feature of chromatin spatial organization. These features, termed topologically associating domains (TADs), have led to the loop extrusion factor (LEF) model, where TADs arise from loop extrusion by cohesin complexes. Currently, our ability to model TADs relies on the observation that in vertebrates TAD boundaries are correlated with DNA sequences that bind CTCF, which therefore is inferred to block loop extrusion. However, although TADs feature prominently in their Hi-C maps, non-vertebrate eukaryotes either do not express CTCF or show few TAD boundaries that correlate with CTCF sites. In all of these organisms, the counterparts of CTCF remain unknown, frustrating comparisons between Hi-C data and simulations. To extend the LEF model across the tree of life, here, we propose theconserved-current loop extrusion (CCLE) modelthat interprets loop-extruding cohesin as a nearly-conserved probability current. From cohesin ChIP-seq data alone, we thus derive a position-dependent loop extrusion rate, allowing for a modified paradigm for loop extrusion, that goes beyond solely discrete, localized barriers to also include loop extrusion rates that vary more continuously across the genome. To demonstrate its utility in organisms lacking CTCF, we applied the CCLE model to the Hi-C maps of interphase Schizosaccharomyces pombe, as well as to those of meiotic and mitotic Saccharomyces cerevisiae In all cases, even though their Hi-C maps appear quite different, the model accurately predicts the TAD-scale Hi-C maps. It follows that loop extrusion by cohesin is indeed the primary mechanism underlying TADs in these systems. CCLE allows us to obtain loop extrusion parameters such as the LEF density and processivity, which compare well to independent estimates. The model also provides new insights into in vivo LEF composition and function. 
    more » « less
  3. Abstract Background

    Inhomogeneous patterns of chromatin-chromatin contacts within 10–100-kb-sized regions of the genome are a generic feature of chromatin spatial organization. These features, termed topologically associating domains (TADs), have led to the loop extrusion factor (LEF) model. Currently, our ability to model TADs relies on the observation that in vertebrates TAD boundaries are correlated with DNA sequences that bind CTCF, which therefore is inferred to block loop extrusion. However, although TADs feature prominently in their Hi-C maps, non-vertebrate eukaryotes either do not express CTCF or show few TAD boundaries that correlate with CTCF sites. In all of these organisms, the counterparts of CTCF remain unknown, frustrating comparisons between Hi-C data and simulations.

    Results

    To extend the LEF model across the tree of life, here, we propose theconserved-current loop extrusion (CCLE) modelthat interprets loop-extruding cohesin as a nearly conserved probability current. From cohesin ChIP-seq data alone, we derive a position-dependent loop extrusion rate, allowing for a modified paradigm for loop extrusion, that goes beyond solely localized barriers to also include loop extrusion rates that vary continuously. We show that CCLE accurately predicts the TAD-scale Hi-C maps of interphaseSchizosaccharomyces pombe, as well as those of meiotic and mitoticSaccharomyces cerevisiae, demonstrating its utility in organisms lacking CTCF.

    Conclusions

    The success of CCLE in yeasts suggests that loop extrusion by cohesin is indeed the primary mechanism underlying TADs in these systems. CCLE allows us to obtain loop extrusion parameters such as the LEF density and processivity, which compare well to independent estimates.

     
    more » « less
  4. Abstract Motivation

    The three dimensional organization of chromosomes within the cell nucleus is highly regulated. It is known that CCCTC-binding factor (CTCF) is an important architectural protein to mediate long-range chromatin loops. Recent studies have shown that the majority of CTCF binding motif pairs at chromatin loop anchor regions are in convergent orientation. However, it remains unknown whether the genomic context at the sequence level can determine if a convergent CTCF motif pair is able to form a chromatin loop.

    Results

    In this article, we directly ask whether and what sequence-based features (other than the motif itself) may be important to establish CTCF-mediated chromatin loops. We found that motif conservation measured by ‘branch-of-origin’ that accounts for motif turn-over in evolution is an important feature. We developed a new machine learning algorithm called CTCF-MP based on word2vec to demonstrate that sequence-based features alone have the capability to predict if a pair of convergent CTCF motifs would form a loop. Together with functional genomic signals from CTCF ChIP-seq and DNase-seq, CTCF-MP is able to make highly accurate predictions on whether a convergent CTCF motif pair would form a loop in a single cell type and also across different cell types. Our work represents an important step further to understand the sequence determinants that may guide the formation of complex chromatin architectures.

    Availability and implementation

    The source code of CTCF-MP can be accessed at: https://github.com/ma-compbio/CTCF-MP

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Abstract

    During mammalian embryogenesis, both the 5-cytosine DNA methylation (5meC) landscape and three dimensional (3D) chromatin architecture are profoundly remodeled during a process known as ‘epigenetic reprogramming.’ An understudied aspect of epigenetic reprogramming is how the 5meC flux, per se, affects the 3D genome. This is pertinent given the 5meC-sensitivity of DNA binding for a key regulator of chromosome folding: CTCF. We profiled the CTCF binding landscape using a mouse embryonic stem cell (ESC) differentiation protocol that models embryonic 5meC dynamics. Mouse ESCs lacking DNA methylation machinery are able to exit naive pluripotency, thus allowing for dissection of subtle effects of CTCF on gene expression. We performed CTCF HiChIP in both wild-type and mutant conditions to assess gained CTCF–CTCF contacts in the absence of 5meC. We performed H3K27ac HiChIP to determine the impact that ectopic CTCF binding has on cis-regulatory contacts. Using 5meC epigenome editing, we demonstrated that the methyl-mark is able to impair CTCF binding at select loci. Finally, a detailed dissection of the imprinted Zdbf2 locus showed how 5meC-antagonism of CTCF allows for proper gene regulation during differentiation. This work provides a comprehensive overview of how 5meC impacts the 3D genome in a relevant model for early embryonic events.

     
    more » « less