skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Single‐Cell Hi‐C Technologies and Computational Data Analysis
Abstract Single‐cell chromatin conformation capture (scHi‐C) techniques have evolved to provide significant insights into the structural organization and regulatory mechanisms in individual cells. Although many scHi‐C protocols have been developed, they often involve intricate procedures and the resulting data are sparse, leading to computational challenges for systematic data analysis and limited applicability. This review provides a comprehensive overview, quantitative evaluation of thirteen protocols and practical guidance on computational topics. It is first assessed the efficiency of these protocols based on the total number of contacts recovered per cell and thecis/transratio. It is then provided systematic considerations for scHi‐C quality control and data imputation. Additionally, the capabilities and implementations of various analysis methods, covering cell clustering, A/B compartment calling, topologically associating domain (TAD) calling, loop calling, 3D reconstruction, scHi‐C data simulation and differential interaction analysis is summarized. It is further highlighted key computational challenges associated with the specific complexities of scHi‐C data and propose potential solutions.  more » « less
Award ID(s):
2239350
PAR ID:
10569240
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Advanced Science
Volume:
12
Issue:
9
ISSN:
2198-3844
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract MotivationSingle-cell Hi-C (scHi-C) technologies have significantly advanced our understanding of the 3D genome organization. However, scHi-C data are often sparse and noisy, leading to substantial computational challenges in downstream analyses. ResultsIn this study, we introduce SHICEDO, a novel deep-learning model specifically designed to enhance scHi-C contact matrices by imputing missing or sparsely captured chromatin contacts through a generative adversarial framework. SHICEDO leverages the unique structural characteristics of scHi-C matrices to derive customized features that enable effective data enhancement. Additionally, the model incorporates a channel-wise attention mechanism to mitigate the over-smoothing issue commonly associated with scHi-C enhancement methods. Through simulations and real-data applications, we demonstrate that SHICEDO outperforms the state-of-the-art methods, achieving superior quantitative and qualitative results. Moreover, SHICEDO enhances key structural features in scHi-C data, thus enabling more precise delineation of chromatin structures such as A/B compartments, TAD-like domains, and chromatin loops. Availability and implementationSHICEDO is publicly available at https://github.com/wmalab/SHICEDO. 
    more » « less
  2. Abstract MotivationSingle-cell Hi-C (scHi-C) data provide critical insights into chromatin interactions at individual cell levels, uncovering unique genomic 3D structures. However, scHi-C datasets are characterized by sparsity and noise, complicating efforts to accurately reconstruct high-resolution chromosomal structures. In this study, we present ScUnicorn, a novel blind super-resolution framework for scHi-C data enhancement. ScUnicorn uses an iterative degradation kernel optimization process, unlike traditional super-resolution approaches, which rely on downsampling, predefined degradation ratios, or constant assumptions about the input data to reconstruct high-resolution interaction matrices. Hence, our approach more reliably preserves critical biological patterns and minimizes noise. Additionally, we propose 3DUnicorn, a maximum likelihood algorithm that leverages the enhanced scHi-C data to infer precise 3D chromosomal structures. ResultsOur evaluation demonstrates that ScUnicorn achieves superior performance over the state-of-the-art methods in terms of Peak Signal-to-Noise Ratio, Structural Similarity Index Measure, and GenomeDisco scores. Moreover, 3DUnicorn’s reconstructed structures align closely with experimental 3D-FISH data, underscoring its biological relevance. Together, ScUnicorn and 3DUnicorn provide a robust framework for advancing genomic research by enhancing scHi-C data fidelity and enabling accurate 3D genome structure reconstruction. Availability and implementationUnicorn implementation is publicly accessible at https://github.com/OluwadareLab/Unicorn. 
    more » « less
  3. Abstract BackgroundEpigenomic profiling assays such as ChIP-seq have been widely used to map the genome-wide enrichment profiles of chromatin-associated proteins and posttranslational histone modifications. Sequencing depth is a key parameter in experimental design and quality control. However, due to variable sequencing depth requirements across experimental conditions, it can be challenging to determine optimal sequencing depth, particularly for projects involving multiple targets or cell types. ResultsWe developed thepeaksatR package to provide target read depth estimates for epigenomic experiments based on the analysis of peak saturation curves. We appliedpeaksatto establish the distinctive read depth requirements for ChIP-seq studies of histone modifications in different cell lines. Usingpeaksat,we were able to estimate the target read depth required per library to obtain high-quality peak calls for downstream analysis. In addition,peaksatwas applied to other sequence-enrichment methods including CUT&RUN and ATAC-seq. Conclusionpeaksataddresses a need for researchers to make informed decisions about whether their sequencing data has been generated to an adequate depth and subsequently sufficient meaningful peaks, and failing that, how many more reads would be required per library.peaksatis applicable to other sequence-based methods that include calling peaks in their analysis. 
    more » « less
  4. Abstract Coptotermes formosanusShiraki andCoptotermes gestroi(Wasmann) (Blattoidea: Rhinotermitidae) are invasive subterranean termite pest species with a major global economic impact. However, the descriptions of the mutualistic protist communities harbored in their respective hindguts remain fragmentary. TheC. formosanushindgut has long been considered to harbor three protist species,Pseudotrichonympha grassii(Trichonymphida),Holomastigotoides hartmanni, andCononympha(Spirotrichonympha)leidyi(Spirotrichonymphida), but molecular data have suggested that the diversity may be higher. Meanwhile, theC. gestroicommunity remains undescribed except forPseudotrichonympha leei. To complete the characterization of these communities, hindguts of workers from both termite species were investigated using single‐cell PCR, microscopy, cell counts, and 18S rRNA amplicon sequencing. The two hosts were found to harbor intriguingly parallel protist communities, each consisting of onePseudotrichonymphaspecies, twoHolomastigotoidesspecies, and twoCononymphaspecies. All protist species were unique to their respective hosts, which last shared a common ancestor ~18 MYA. The relative abundances of protist species in each hindgut differed remarkably between cell count data and 18S rRNA profiles, calling for caution in interpreting species abundances from amplicon data. This study will enable future research inC. formosanusandC. gestroihybrids, which provide a unique opportunity to study protist community inheritance, compatibility, and potential contribution to hybrid vigor. 
    more » « less
  5. Abstract In recent years, the integration of single‐cell multi‐omics data has provided a more comprehensive understanding of cell functions and internal regulatory mechanisms from a non‐single omics perspective, but it still suffers many challenges, such as omics‐variance, sparsity, cell heterogeneity, and confounding factors. As it is known, the cell cycle is regarded as a confounder when analyzing other factors in single‐cell RNA‐seq data, but it is not clear how it will work on the integrated single‐cell multi‐omics data. Here, a cell cycle‐aware network (CCAN) is developed to remove cell cycle effects from the integrated single‐cell multi‐omics data while keeping the cell type‐specific variations. This is the first computational model to study the cell‐cycle effects in the integration of single‐cell multi‐omics data. Validations on several benchmark datasets show the outstanding performance of CCAN in a variety of downstream analyses and applications, including removing cell cycle effects and batch effects of scRNA‐seq datasets from different protocols, integrating paired and unpaired scRNA‐seq and scATAC‐seq data, accurately transferring cell type labels from scRNA‐seq to scATAC‐seq data, and characterizing the differentiation process from hematopoietic stem cells to different lineages in the integration of differentiation data. 
    more » « less