skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Revealing Hi-C subcompartments by imputing inter-chromosomal chromatin interactions
Abstract Higher-order genome organization and its variation in different cellular conditions remain poorly understood. Recent high-coverage genome-wide chromatin interaction mapping using Hi-C has revealed spatial segregation of chromosomes in the human genome into distinct subcompartments. However, subcompartment annotation, which requires Hi-C data with high sequencing coverage, is currently only available in the GM12878 cell line, making it impractical to compare subcompartment patterns across cell types. Here we develop a computational approach, SNIPER (Subcompartment iNference using Imputed Probabilistic ExpRessions), based on denoising autoencoder and multilayer perceptron classifier to infer subcompartments using typical Hi-C datasets with moderate coverage. SNIPER accurately reveals subcompartments using moderate coverage Hi-C datasets and outperforms an existing method that uses epigenomic features in GM12878. We apply SNIPER to eight additional cell lines and find that chromosomal regions with conserved and cell-type specific subcompartment annotations have different patterns of functional genomic features. SNIPER enables the identification of subcompartments without high-coverage Hi-C data and provides insights into the function and mechanisms of spatial genome organization variation across cell types.  more » « less
Award ID(s):
1717205
PAR ID:
10153856
Author(s) / Creator(s):
;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Communications
Volume:
10
Issue:
1
ISSN:
2041-1723
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract MotivationThe exploration of the 3D organization of DNA within the nucleus in relation to various stages of cellular development has led to experiments generating spatiotemporal Hi-C data. However, there is limited spatiotemporal Hi-C data for many organisms, impeding the study of 3D genome dynamics. To overcome this limitation and advance our understanding of genome organization, it is crucial to develop methods for forecasting Hi-C data at future time points from existing timeseries Hi-C data. ResultIn this work, we designed a novel framework named HiCForecast, adopting a dynamic voxel flow algorithm to forecast future spatiotemporal Hi-C data. We evaluated how well our method generalizes forecasting data across different species and systems, ensuring performance in homogeneous, heterogeneous, and general contexts. Using both computational and biological evaluation metrics, our results show that HiCForecast outperforms the current state-of-the-art algorithm, emerging as an efficient and powerful tool for forecasting future spatiotemporal Hi-C datasets. Availability and implementationHiCForecast is publicly available at https://github.com/OluwadareLab/HiCForecast. 
    more » « less
  2. The spatial organization of chromatin is fundamental to gene regulation and essential for proper cellular function. The Hi-C technique remains the leading method for unraveling 3D genome structures, but the limited availability of high-resolution Hi-C data poses significant challenges for comprehensive analysis. Deep learning models have been developed to predict high-resolution Hi-C data from low-resolution counterparts. Early CNN-based models improved resolution but struggled with issues like blurring and capturing fine details. In contrast, GAN-based methods encountered difficulties in maintaining diversity and generalization. Additionally, most existing algorithms perform poorly in cross-cell line generalization, where a model trained on one cell type is used to enhance high-resolution data in another cell type. In this work, we propose DiCARN (Dilated Cascading Residual Network) to overcome these challenges and improve Hi-C data resolution. DiCARN leverages dilated convolutions and cascading residuals to capture a broader context while preserving fine-grained genomic interactions. Additionally, we incorporate DNase-seq data into our model, providing a robust framework that demonstrates superior generalizability across cell lines in high-resolution Hi-C data reconstruction. DiCARN is publicly available at https://github.com/OluwadareLab/DiCARN 
    more » « less
  3. Abstract High-resolution reconstruction of spatial chromosome organizations from chromatin contact maps is highly demanded, but is hindered by extensive pairwise constraints, substantial missing data, and limited resolution and cell-type availabilities. Here, we present FLAMINGO, a computational method that addresses these challenges by compressing inter-dependent Hi-C interactions to delineate the underlying low-rank structures in 3D space, based on the low-rank matrix completion technique. FLAMINGO successfully generates 5 kb- and 1 kb-resolution spatial conformations for all chromosomes in the human genome across multiple cell-types, the largest resources to date. Compared to other methods using various experimental metrics, FLAMINGO consistently demonstrates superior accuracy in recapitulating observed structures with raises in scalability by orders of magnitude. The reconstructed 3D structures efficiently facilitate discoveries of higher-order multi-way interactions, imply biological interpretations of long-range QTLs, reveal geometrical properties of chromatin, and provide high-resolution references to understand structural variabilities. Importantly, FLAMINGO achieves robust predictions against high rates of missing data and significantly boosts 3D structure resolutions. Moreover, FLAMINGO shows vigorous cross cell-type structure predictions that capture cell-type specific spatial configurations via integration of 1D epigenomic signals. FLAMINGO can be widely applied to large-scale chromatin contact maps and expand high-resolution spatial genome conformations for diverse cell-types. 
    more » « less
  4. Abstract Enhancer-promoter interactions (EPIs) are fundamental to gene regulation, and understanding their recurrence across diverse biological samples is key to deciphering chromatin architecture. In this study, we systematically analyzed the recurrence of EPIs across 49 Hi-C and 95 HiChIP datasets. We found that the majority of EPIs identified in a given sample were also present in other samples, regardless of the assay type (Hi-C or HiChIP) or the enhancer annotations used. Interestingly, EPIs that appeared unique to individual samples were typically surrounded by fewer neighboring EPIs, suggesting they may not represent truly sample-specific interactions. Our findings indicate that most human EPIs have already been captured and that cells primarily reuse subsets of these shared EPIs across different cell types and conditions. This study provides new insights into the pervasive and reusable nature of EPIs in the human genome, with important implications for chromatin conformation studies. 
    more » « less
  5. Abstract Hi-C characterizes three-dimensional chromatin organization, facilitates haplotype phasing, and enables genome-assembly scaffolding, but encounters difficulties across complex regions. By coupling chromosome conformation capture (3C) with PacBio HiFilong-read sequencing, here we develop a method (CiFi) that enables analysis of genomic interactions across repetitive regions. Starting with as little as 60,000 cells (sub-microgram DNA), the method produces multi-kilobasepair HiFi reads that contain multiple interacting, concatenated segments (~350 bp to 2 kbp). This multiplicity and increase in segment length versus standard short-read-based Hi-C improves read-mapping efficiency and coverage in repetitive regions and enhances haplotype phasing. CiFi pairwise interactions are largely concordant with Hi-C from a human lymphoblastoid cell line, with gains in assigning topologically associating domains across centromeres, segmental duplications, and human disease-associated genomic hotspots. As CiFi requires less input versus established methods, we apply the approach to characterize single small insects: assaying chromatin interactions across the genome from anAnopheles coluzziimosquito and producing a chromosome-scale scaffolded assembly from aCeratitis capitataMediterranean fruit fly. Together, CiFi enables assessment of chromosome-scale interactions of previously recalcitrant low-complexity loci, low-input samples, and small organisms. 
    more » « less