skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


Title: scHiCDiff: detecting differential chromatin interactions in single-cell Hi-C data
Abstract Summary

Here, we presented the scHiCDiff software tool that provides both nonparametric tests and parametirc models to detect differential chromatin interactions (DCIs) from single-cell Hi-C data. We thoroughly evaluated the scHiCDiff methods on both simulated and real data. Our results demonstrated that scHiCDiff, especially the zero-inflated negative binomial model option, can effectively detect reliable and consistent single-cell DCIs between two conditions, thereby facilitating the study of cell type-specific variations of chromatin structures at the single-cell level.

Availability and implementation

scHiCDiff is implemented in R and freely available at GitHub (https://github.com/wmalab/scHiCDiff).

 
more » « less
Award ID(s):
2215705 1751317
PAR ID:
10470742
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
39
Issue:
10
ISSN:
1367-4811
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    B-type lamins are critical nuclear envelope proteins that interact with the three-dimensional genomic architecture. However, identifying the direct roles of B-lamins on dynamic genome organization has been challenging as their joint depletion severely impacts cell viability. To overcome this, we engineered mammalian cells to rapidly and completely degrade endogenous B-type lamins using Auxin-inducible degron technology.

    Results

    Using live-cell Dual Partial Wave Spectroscopic (Dual-PWS) microscopy, Stochastic Optical Reconstruction Microscopy (STORM), in situ Hi-C, CRISPR-Sirius, and fluorescence in situ hybridization (FISH), we demonstrate that lamin B1 and lamin B2 are critical structural components of the nuclear periphery that create a repressive compartment for peripheral-associated genes. Lamin B1 and lamin B2 depletion minimally alters higher-order chromatin folding but disrupts cell morphology, significantly increases chromatin mobility, redistributes both constitutive and facultative heterochromatin, and induces differential gene expression both within and near lamin-associated domain (LAD) boundaries. Critically, we demonstrate that chromatin territories expand as upregulated genes within LADs radially shift inwards. Our results indicate that the mechanism of action of B-type lamins comes from their role in constraining chromatin motion and spatial positioning of gene-specific loci, heterochromatin, and chromatin domains.

    Conclusions

    Our findings suggest that, while B-type lamin degradation does not significantly change genome topology, it has major implications for three-dimensional chromatin conformation at the single-cell level both at the lamina-associated periphery and the non-LAD-associated nuclear interior with concomitant genome-wide transcriptional changes. This raises intriguing questions about the individual and overlapping roles of lamin B1 and lamin B2 in cellular function and disease.

     
    more » « less
  2. Abstract Motivation

    Genome-wide maps of epigenetic modifications are powerful resources for non-coding genome annotation. Maps of multiple epigenetics marks have been integrated into cell or tissue type-specific chromatin state annotations for many cell or tissue types. With the increasing availability of multiple chromatin state maps for biologically similar samples, there is a need for methods that can effectively summarize the information about chromatin state annotations within groups of samples and identify differences across groups of samples at a high resolution.

    Results

    We developed CSREP, which takes as input chromatin state annotations for a group of samples. CSREP then probabilistically estimates the state at each genomic position and derives a representative chromatin state map for the group. CSREP uses an ensemble of multi-class logistic regression classifiers that predict the chromatin state assignment of each sample given the state maps from all other samples. The difference in CSREP’s probability assignments for the two groups can be used to identify genomic locations with differential chromatin state assignments. Using groups of chromatin state maps of a diverse set of cell and tissue types, we demonstrate the advantages of using CSREP to summarize chromatin state maps and identify biologically relevant differences between groups at a high resolution.

    Availability and implementation

    The CSREP source code and generated data are available at http://github.com/ernstlab/csrep.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  3. Abstract Motivation

    The three dimensional organization of chromosomes within the cell nucleus is highly regulated. It is known that CCCTC-binding factor (CTCF) is an important architectural protein to mediate long-range chromatin loops. Recent studies have shown that the majority of CTCF binding motif pairs at chromatin loop anchor regions are in convergent orientation. However, it remains unknown whether the genomic context at the sequence level can determine if a convergent CTCF motif pair is able to form a chromatin loop.

    Results

    In this article, we directly ask whether and what sequence-based features (other than the motif itself) may be important to establish CTCF-mediated chromatin loops. We found that motif conservation measured by ‘branch-of-origin’ that accounts for motif turn-over in evolution is an important feature. We developed a new machine learning algorithm called CTCF-MP based on word2vec to demonstrate that sequence-based features alone have the capability to predict if a pair of convergent CTCF motifs would form a loop. Together with functional genomic signals from CTCF ChIP-seq and DNase-seq, CTCF-MP is able to make highly accurate predictions on whether a convergent CTCF motif pair would form a loop in a single cell type and also across different cell types. Our work represents an important step further to understand the sequence determinants that may guide the formation of complex chromatin architectures.

    Availability and implementation

    The source code of CTCF-MP can be accessed at: https://github.com/ma-compbio/CTCF-MP

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  4. Abstract

    Genome-wide profiling of chromatin accessibility by DNase-seq or ATAC-seq has been widely used to identify regulatory DNA elements and transcription factor binding sites. However, enzymatic DNA cleavage exhibits intrinsic sequence biases that confound chromatin accessibility profiling data analysis. Existing computational tools are limited in their ability to account for such intrinsic biases and not designed for analyzing single-cell data. Here, we present Simplex Encoded Linear Model for Accessible Chromatin (SELMA), a computational method for systematic estimation of intrinsic cleavage biases from genomic chromatin accessibility profiling data. We demonstrate that SELMA yields accurate and robust bias estimation from both bulk and single-cell DNase-seq and ATAC-seq data. SELMA can utilize internal mitochondrial DNA data to improve bias estimation. We show that transcription factor binding inference from DNase footprints can be improved by incorporating estimated biases using SELMA. Furthermore, we show strong effects of intrinsic biases in single-cell ATAC-seq data, and develop the first single-cell ATAC-seq intrinsic bias correction model to improve cell clustering. SELMA can enhance the performance of existing bioinformatics tools and improve the analysis of both bulk and single-cell chromatin accessibility sequencing data.

     
    more » « less
  5. Abstract Motivation

    Gene regulatory networks define regulatory relationships between transcription factors and target genes within a biological system, and reconstructing them is essential for understanding cellular growth and function. Methods for inferring and reconstructing networks from genomics data have evolved rapidly over the last decade in response to advances in sequencing technology and machine learning. The scale of data collection has increased dramatically; the largest genome-wide gene expression datasets have grown from thousands of measurements to millions of single cells, and new technologies are on the horizon to increase to tens of millions of cells and above.

    Results

    In this work, we present the Inferelator 3.0, which has been significantly updated to integrate data from distinct cell types to learn context-specific regulatory networks and aggregate them into a shared regulatory network, while retaining the functionality of the previous versions. The Inferelator is able to integrate the largest single-cell datasets and learn cell-type-specific gene regulatory networks. Compared to other network inference methods, the Inferelator learns new and informative Saccharomyces cerevisiae networks from single-cell gene expression data, measured by recovery of a known gold standard. We demonstrate its scaling capabilities by learning networks for multiple distinct neuronal and glial cell types in the developing Mus musculus brain at E18 from a large (1.3 million) single-cell gene expression dataset with paired single-cell chromatin accessibility data.

    Availability and implementation

    The inferelator software is available on GitHub (https://github.com/flatironinstitute/inferelator) under the MIT license and has been released as python packages with associated documentation (https://inferelator.readthedocs.io/).

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less