skip to main content

Title: A Markov random field model-based approach for differentially expressed gene detection from single-cell RNA-seq data

The development of single-cell RNA-sequencing (scRNA-seq) technologies has offered insights into complex biological systems at the single-cell resolution. In particular, these techniques facilitate the identifications of genes showing cell-type-specific differential expressions (DE). In this paper, we introduce MARBLES, a novel statistical model for cross-condition DE gene detection from scRNA-seq data. MARBLES employs a Markov Random Field model to borrow information across similar cell types and utilizes cell-type-specific pseudobulk count to account for sample-level variability. Our simulation results showed that MARBLES is more powerful than existing methods to detect DE genes with an appropriate control of false positive rate. Applications of MARBLES to real data identified novel disease-related DE genes and biological pathways from both a single-cell lipopolysaccharide mouse dataset with 24 381 cells and 11 076 genes and a Parkinson’s disease human data set with 76 212 cells and 15 891 genes. Overall, MARBLES is a powerful tool to identify cell-type-specific DE genes across conditions from scRNA-seq data.

; ; ; ;
Publication Date:
Journal Name:
Briefings in Bioinformatics
Oxford University Press
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    In recent years, the explosive growth of spatial technologies has enabled the characterization of spatial heterogeneity of tissue architectures. Compared to traditional sequencing, spatial transcriptomics reserves the spatial information of each captured location and provides novel insights into diverse spatially related biological contexts. Even though two spatial transcriptomics databases exist, they provide limited analytical information. Information such as spatial heterogeneity of genes and cells, cell-cell communication activities in space, and the cell type compositions in the microenvironment are critical clues to unveil the mechanism of tumorigenesis and embryo differentiation. Therefore, we constructed a new spatial transcriptomics database, named SPASCER (, designed to help understand the heterogeneity of tissue organizations, region-specific microenvironment, and intercellular interactions across tissue architectures at multiple levels. SPASCER contains datasets from 43 studies, including 1082 sub-datasets from 16 organ types across four species. scRNA-seq was integrated to deconvolve/map spatial transcriptomics, and processed with spatial cell-cell interaction, gene pattern and pathway enrichment analysis. Cell–cell interactions and gene regulation network of scRNA-seq from matched spatial transcriptomics were performed as well. The application of SPASCER will provide new insights into tissue architecture and a solid foundation for the mechanistic understanding of many biological processes in healthy and diseased tissues.

  2. Abstract

    Integrating single-cell RNA sequencing (scRNA-seq) data with genotypes obtained from DNA sequencing studies facilitates the detection of functional genetic variants underlying cell type specific gene expression variation. Unfortunately, most existing scRNA-seq studies do not come with DNA sequencing data; thus, being able to call single nucleotide variants (SNVs) from scRNA-seq data alone can provide crucial and complementary information, detection of functional SNVs, maximizing the potential of existing scRNA-seq studies. Here, we perform extensive analyses to evaluate the utility of two SNV calling pipelines (GATK and Monovar), originally designed for SNV calling in either bulk or single cell DNA sequencing data. In both pipelines, we examined various parameter settings to determine the accuracy of the final SNV call set and provide practical recommendations for applied analysts. We found that combining all reads from the single cells and following GATK Best Practices resulted in the highest number of SNVs identified with a high concordance. In individual single cells, Monovar resulted in better quality SNVs even though none of the pipelines analysed is capable of calling a reasonable number of SNVs with high accuracy. In addition, we found that SNV calling quality varies across different functional genomic regions. Our results open doorsmore »for novel ways to leverage the use of scRNA-seq for the future investigation of SNV function.

    « less
  3. Abstract Background

    Single-cell RNA-sequencing (scRNA-seq) technologies allow for the study of gene expression in individual cells. Often, it is of interest to understand how transcriptional activity is associated with cell-specific covariates, such as cell type, genotype, or measures of cell health. Traditional approaches for this type of association mapping assume independence between the outcome variables (or genes), and perform a separate regression for each. However, these methods are computationally costly and ignore the substantial correlation structure of gene expression. Furthermore, count-based scRNA-seq data pose challenges for traditional models based on Gaussian assumptions.


    We aim to resolve these issues by developing a reduced-rank regression model that identifies low-dimensional linear associations between a large number of cell-specific covariates and high-dimensional gene expression readouts. Our probabilistic model uses a Poisson likelihood in order to account for the unique structure of scRNA-seq counts. We demonstrate the performance of our model using simulations, and we apply our model to a scRNA-seq dataset, a spatial gene expression dataset, and a bulk RNA-seq dataset to show its behavior in three distinct analyses.


    We show that our statistical modeling approach, which is based on reduced-rank regression, captures associations between gene expression and cell- and sample-specific covariates by leveraging low-dimensionalmore »representations of transcriptional states.

    « less
  4. Abstract

    Genetic mutations to the Lamin A/C gene (LMNA) can cause heart disease, but the mechanisms making cardiac tissues uniquely vulnerable to the mutations remain largely unknown. Further, patients withLMNAmutations have highly variable presentation of heart disease progression and type.In vitropatient-specific experiments could provide a powerful platform for studying this phenomenon, but the use of induced pluripotent stem cell-derived cardiomyocytes (iPSC-CM) introduces heterogeneity in maturity and function thus complicating the interpretation of the results of any single experiment. We hypothesized that integrating single cell RNA sequencing (scRNA-seq) with analysis of the tissue architecture and contractile function would elucidate some of the probable mechanisms. To test this, we investigated five iPSC-CM lines, three controls and two patients with a (c.357-2A>G) mutation. The patient iPSC-CM tissues had significantly weaker stress generation potential than control iPSC-CM tissues demonstrating the viability of ourin vitroapproach. Through scRNA-seq, differentially expressed genes between control and patient lines were identified. Some of these genes, linked to quantitative structural and functional changes, were cardiac specific, explaining the targeted nature of the disease progression seen in patients. The results of this work demonstrate the utility of combiningin vitrotools in exploring heart disease mechanics.

  5. null (Ed.)
    Single cell RNA-sequencing (scRNA-seq) technology enables comprehensive transcriptomic profiling of thousands of cells with distinct phenotypic and physiological states in a complex tissue. Substantial efforts have been made to characterize single cells of distinct identities from scRNA-seq data, including various cell clustering techniques. While existing approaches can handle single cells in terms of different cell (sub)types at a high resolution, identification of the functional variability within the same cell type remains unsolved. In addition, there is a lack of robust method to handle the inter-subject variation that often brings severe confounding effects for the functional clustering of single cells. In this study, we developed a novel data denoising and cell clustering approach, namely CIBS, to provide biologically explainable functional classification for scRNA-seq data. CIBS is based on a systems biology model of transcriptional regulation that assumes a multi-modality distribution of the cells’ activation status, and it utilizes a Boolean matrix factorization approach on the discretized expression status to robustly derive functional modules. CIBS is empowered by a novel fast Boolean Matrix Factorization method, namely PFAST, to increase the computational feasibility on large scale scRNA-seq data. Application of CIBS on two scRNA-seq datasets collected from cancer tumor micro-environment successfully identified subgroupsmore »of cancer cells with distinct expression patterns of epithelial-mesenchymal transition and extracellular matrix marker genes, which was not revealed by the existing cell clustering analysis tools. The identified cell groups were significantly associated with the clinically confirmed lymph-node invasion and metastasis events across different patients. Index Terms—Cell clustering analysis, Data denoising, Boolean matrix factorization, Cancer microenvirionment, Metastasis.« less