skip to main content


Title: Transposable elements contribute to cell and species-specific chromatin looping and gene regulation in mammalian genomes
Abstract

Chromatin looping is important for gene regulation, and studies of 3D chromatin structure across species and cell types have improved our understanding of the principles governing chromatin looping. However, 3D genome evolution and its relationship with natural selection remains largely unexplored. In mammals, the CTCF protein defines the boundaries of most chromatin loops, and variations in CTCF occupancy are associated with looping divergence. While many CTCF binding sites fall within transposable elements (TEs), their contribution to 3D chromatin structural evolution is unknown. Here we report the relative contributions of TE-driven CTCF binding site expansions to conserved and divergent chromatin looping in human and mouse. We demonstrate that TE-derived CTCF binding divergence may explain a large fraction of variable loops. These variable loops contribute significantly to corresponding gene expression variability across cells and species, possibly by refining sub-TAD-scale loop contacts responsible for cell-type-specific enhancer-promoter interactions.

 
more » « less
Award ID(s):
1651614
NSF-PAR ID:
10153815
Author(s) / Creator(s):
; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
11
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    The three dimensional organization of chromosomes within the cell nucleus is highly regulated. It is known that CCCTC-binding factor (CTCF) is an important architectural protein to mediate long-range chromatin loops. Recent studies have shown that the majority of CTCF binding motif pairs at chromatin loop anchor regions are in convergent orientation. However, it remains unknown whether the genomic context at the sequence level can determine if a convergent CTCF motif pair is able to form a chromatin loop.

    Results

    In this article, we directly ask whether and what sequence-based features (other than the motif itself) may be important to establish CTCF-mediated chromatin loops. We found that motif conservation measured by ‘branch-of-origin’ that accounts for motif turn-over in evolution is an important feature. We developed a new machine learning algorithm called CTCF-MP based on word2vec to demonstrate that sequence-based features alone have the capability to predict if a pair of convergent CTCF motifs would form a loop. Together with functional genomic signals from CTCF ChIP-seq and DNase-seq, CTCF-MP is able to make highly accurate predictions on whether a convergent CTCF motif pair would form a loop in a single cell type and also across different cell types. Our work represents an important step further to understand the sequence determinants that may guide the formation of complex chromatin architectures.

    Availability and implementation

    The source code of CTCF-MP can be accessed at: https://github.com/ma-compbio/CTCF-MP

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Abstract

    Mammalian genomes are folded into a hierarchy of compartments, topologically associating domains (TADs), subTADs, and long-range looping interactions. The higher-order folding patterns of chromatin contacts within TADs and how they localize to disease-associated single nucleotide variants (daSNVs) remains an open area of investigation. Here, we analyze high-resolution Hi-C data with graph theory to understand possible mesoscale network architecture within chromatin domains. We identify a subset of TADs exhibiting strong core-periphery mesoscale structure in embryonic stem cells, neural progenitor cells, and cortical neurons. Hyper-connected core nodes co-localize with genomic segments engaged in multiple looping interactions and enriched for occupancy of the architectural protein CCCTC binding protein (CTCF). CTCF knockdown andin silicodeletion of CTCF-bound core nodes disrupts core-periphery structure, whereasin silicomutation of cell type-specific enhancer or gene nodes has a negligible effect. Importantly, neuropsychiatric daSNVs are significantly more likely to localize with TADs folded into core-periphery networks compared to domains devoid of such structure. Together, our results reveal that a subset of TADs encompasses looping interactions connected into a core-periphery mesoscale network. We hypothesize that daSNVs in the periphery of genome folding networks might preserve global nuclear architecture but cause local topological and functional disruptions contributing to human disease. By contrast, daSNVs co-localized with hyper-connected core nodes might cause severe topological and functional disruptions. Overall, these findings shed new light into the mesoscale network structure of fine scale genome folding within chromatin domains and its link to common genetic variants in human disease.

     
    more » « less
  3. Abstract Background

    Current evidence suggests thatcis-regulatory elements controlling gene expression may be the predominant target of natural selection in humans and other species. Detecting selection acting on these elements is critical to understanding evolution but remains challenging because we do not know which mutations will affect gene regulation.

    Results

    To address this, we devise an approach to search for lineage-specific selection on three critical steps in transcriptional regulation: chromatin activity, transcription factor binding, and chromosomal looping. Applying this approach to lymphoblastoid cells from 831 individuals of either European or African descent, we find strong signals of differential chromatin activity linked to gene expression differences between ancestries in numerous contexts, but no evidence of functional differences in chromosomal looping. Moreover, we show that enhancers rather than promoters display the strongest signs of selection associated with sites of differential transcription factor binding.

    Conclusions

    Overall, our study indicates that somecis-regulatory adaptation may be more easily detected at the level of chromatin than DNA sequence. This work provides a vast resource of genomic interaction data from diverse human populations and establishes a novel selection test that will benefit future study of regulatory evolution in humans and other species.

     
    more » « less
  4. Nuclear noncoding RNAs (ncRNAs) are key regulators of gene expression and chromatin organization. The progress in studying nuclear ncRNAs depends on the ability to identify the genome-wide spectrum of contacts of ncRNAs with chromatin. To address this question, a panel of RNA–DNA proximity ligation techniques has been developed. However, neither of these techniques examines proteins involved in RNA–chromatin interactions. Here, we introduce RedChIP, a technique combining RNA–DNA proximity ligation and chromatin immunoprecipitation for identifying RNA–chromatin interactions mediated by a particular protein. Using antibodies against architectural protein CTCF and the EZH2 subunit of the Polycomb repressive complex 2, we identify a spectrum of cis - and trans -acting ncRNAs enriched at Polycomb- and CTCF-binding sites in human cells, which may be involved in Polycomb-mediated gene repression and CTCF-dependent chromatin looping. By providing a protein-centric view of RNA–DNA interactions, RedChIP represents an important tool for studies of nuclear ncRNAs. 
    more » « less
  5. Understanding how regulatory mechanisms evolve is critical for understanding the processes that give rise to novel phenotypes. Snake venom systems represent a valuable and tractable model for testing hypotheses related to the evolution of novel regulatory networks, yet the regulatory mechanisms underlying venom production remain poorly understood. Here, we use functional genomics approaches to investigate venom regulatory architecture in the prairie rattlesnake and identify cis -regulatory sequences (enhancers and promoters), trans -regulatory transcription factors, and integrated signaling cascades involved in the regulation of snake venom genes. We find evidence that two conserved vertebrate pathways, the extracellular signal-regulated kinase and unfolded protein response pathways, were co-opted to regulate snake venom. In one large venom gene family (snake venom serine proteases), this co-option was likely facilitated by the activity of transposable elements. Patterns of snake venom gene enhancer conservation, in some cases spanning 50 million yr of lineage divergence, highlight early origins and subsequent lineage-specific adaptations that have accompanied the evolution of venom regulatory architecture. We also identify features of chromatin structure involved in venom regulation, including topologically associated domains and CTCF loops that underscore the potential importance of novel chromatin structure to coevolve when duplicated genes evolve new regulatory control. Our findings provide a model for understanding how novel regulatory systems may evolve through a combination of genomic processes, including tandem duplication of genes and regulatory sequences, cis -regulatory sequence seeding by transposable elements, and diverse transcriptional regulatory proteins controlled by a co-opted regulatory cascade. 
    more » « less