skip to main content


Title: Selfish: discovery of differential chromatin interactions via a self-similarity measure
Abstract Motivation

High-throughput conformation capture experiments, such as Hi-C provide genome-wide maps of chromatin interactions, enabling life scientists to investigate the role of the three-dimensional structure of genomes in gene regulation and other essential cellular functions. A fundamental problem in the analysis of Hi-C data is how to compare two contact maps derived from Hi-C experiments. Detecting similarities and differences between contact maps are critical in evaluating the reproducibility of replicate experiments and for identifying differential genomic regions with biological significance. Due to the complexity of chromatin conformations and the presence of technology-driven and sequence-specific biases, the comparative analysis of Hi-C data is analytically and computationally challenging.

Results

We present a novel method called Selfish for the comparative analysis of Hi-C data that takes advantage of the structural self-similarity in contact maps. We define a novel self-similarity measure to design algorithms for (i) measuring reproducibility for Hi-C replicate experiments and (ii) finding differential chromatin interactions between two contact maps. Extensive experimental results on simulated and real data show that Selfish is more accurate and robust than state-of-the-art methods.

Availability and implementation

https://github.com/ucrbioinfo/Selfish

 
more » « less
Award ID(s):
1814359
NSF-PAR ID:
10425986
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
35
Issue:
14
ISSN:
1367-4803
Page Range / eLocation ID:
p. i145-i153
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    High throughput chromosome conformation capture (Hi-C) contact matrices are used to predict 3D chromatin structures in eukaryotic cells. High-resolution Hi-C data are less available than low-resolution Hi-C data due to sequencing costs but provide greater insight into the intricate details of 3D chromatin structures such as enhancer–promoter interactions and sub-domains. To provide a cost-effective solution to high-resolution Hi-C data collection, deep learning models are used to predict high-resolution Hi-C matrices from existing low-resolution matrices across multiple cell types.

    Results

    Here, we present two Cascading Residual Networks called HiCARN-1 and HiCARN-2, a convolutional neural network and a generative adversarial network, that use a novel framework of cascading connections throughout the network for Hi-C contact matrix prediction from low-resolution data. Shown by image evaluation and Hi-C reproducibility metrics, both HiCARN models, overall, outperform state-of-the-art Hi-C resolution enhancement algorithms in predictive accuracy for both human and mouse 1/16, 1/32, 1/64 and 1/100 downsampled high-resolution Hi-C data. Also, validation by extracting topologically associating domains, chromosome 3D structure and chromatin loop predictions from the enhanced data shows that HiCARN can proficiently reconstruct biologically significant regions.

    Availability and implementation

    HiCARN can be accessed and utilized as an open-sourced software at: https://github.com/OluwadareLab/HiCARN and is also available as a containerized application that can be run on any platform.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  2. Abstract Motivation

    Three-dimensional chromosome structure has been increasingly shown to influence various levels of cellular and genomic functions. Through Hi-C data, which maps contact frequency on chromosomes, it has been found that structural elements termed topologically associating domains (TADs) are involved in many regulatory mechanisms. However, we have little understanding of the level of similarity or variability of chromosome structure across cell types and disease states. In this study, we present a method to quantify resemblance and identify structurally similar regions between any two sets of TADs.

    Results

    We present an analysis of 23 human Hi-C samples representing various tissue types in normal and cancer cell lines. We quantify global and chromosome-level structural similarity, and compare the relative similarity between cancer and non-cancer cells. We find that cancer cells show higher structural variability around commonly mutated pan-cancer genes than normal cells at these same locations.

    Availability and implementation

    Software for the methods and analysis can be found at https://github.com/Kingsford-Group/localtadsim

     
    more » « less
  3. Current Hi-C analysis approaches focus on uniquely mapped reads and little research has been carried out to include multi-mapping reads, which leads to a lack of biological signals from DNA repetitive regions. We propose a heuristic strategy to assign multi-mapping reads to loci according to the distance to their closest restriction enzyme cutting sites. We demonstrate that the heuristic strategy can rescue multi-mapping reads thus enhance the quality of Hi-C data. Compared with mHi-C, it not only improves replicate reproducibility in the same cell type, but also maintains the difference between replicates of different cell types. Moreover, the strategy identifies much more common statistically significant chromatin interactions between Hi-C experiments of different restriction enzymes and has a huge advantage on computing resources. Therefore, the heuristic strategy can be used to enhance Hi-C data by utilizing multi-mapping reads. 
    more » « less
  4. Abstract Background

    Analysis of the relationship between chromosomal structural variation (synteny breaks) and 3D-chromatin architectural changes among closely related species has the potential to reveal causes and correlates between chromosomal change and chromatin remodeling. Of note, contrary to extensive studies in animal species, the pace and pattern of chromatin architectural changes following the speciation of plants remain unexplored; moreover, there is little exploration of the occurrence of synteny breaks in the context of multiple genome topological hierarchies within the same model species.

    Results

    Here we used Hi-C and epigenomic analyses to characterize and compare the profiles of hierarchical chromatin architectural features in representative species of the cotton tribe (Gossypieae), includingGossypium arboreum,Gossypium raimondii, andGossypioides kirkii, which differ with respect to chromosome rearrangements. We found that (i) overall chromatin architectural territories were preserved inGossypioidesandGossypium, which was reflected in their similar intra-chromosomal contact patterns and spatial chromosomal distributions; (ii) the non-random preferential occurrence of synteny breaks in A compartment significantly associate with the B-to-A compartment switch in syntenic blocks flanking synteny breaks; (iii) synteny changes co-localize with open-chromatin boundaries of topologically associating domains, while TAD stabilization has a greater influence on regulating orthologous expression divergence than do rearrangements; and (iv) rearranged chromosome segments largely maintain ancestralin-cisinteractions.

    Conclusions

    Our findings provide insights into the non-random occurrence of epigenomic remodeling relative to the genomic landscape and its evolutionary and functional connections to alterations of hierarchical chromatin architecture, on a known evolutionary timescale.

     
    more » « less
  5. Abstract Summary

    Here, we presented the scHiCDiff software tool that provides both nonparametric tests and parametirc models to detect differential chromatin interactions (DCIs) from single-cell Hi-C data. We thoroughly evaluated the scHiCDiff methods on both simulated and real data. Our results demonstrated that scHiCDiff, especially the zero-inflated negative binomial model option, can effectively detect reliable and consistent single-cell DCIs between two conditions, thereby facilitating the study of cell type-specific variations of chromatin structures at the single-cell level.

    Availability and implementation

    scHiCDiff is implemented in R and freely available at GitHub (https://github.com/wmalab/scHiCDiff).

     
    more » « less