The development of single-cell RNA-sequencing (scRNA-seq) technologies has offered insights into complex biological systems at the single-cell resolution. In particular, these techniques facilitate the identifications of genes showing cell-type-specific differential expressions (DE). In this paper, we introduce MARBLES, a novel statistical model for cross-condition DE gene detection from scRNA-seq data. MARBLES employs a Markov Random Field model to borrow information across similar cell types and utilizes cell-type-specific pseudobulk count to account for sample-level variability. Our simulation results showed that MARBLES is more powerful than existing methods to detect DE genes with an appropriate control of false positive rate. Applications of MARBLES to real data identified novel disease-related DE genes and biological pathways from both a single-cell lipopolysaccharide mouse dataset with 24 381 cells and 11 076 genes and a Parkinson’s disease human data set with 76 212 cells and 15 891 genes. Overall, MARBLES is a powerful tool to identify cell-type-specific DE genes across conditions from scRNA-seq data.
The advancement of single cell RNA-sequencing (scRNA-seq) technology has enabled the direct inference of co-expressions in specific cell types, facilitating our understanding of cell-type-specific biological functions. For this task, the high sequencing depth variations and measurement errors in scRNA-seq data present two significant challenges, and they have not been adequately addressed by existing methods. We propose a statistical approach, CS-CORE, for estimating and testing cell-type-specific co-expressions, that explicitly models sequencing depth variations and measurement errors in scRNA-seq data. Systematic evaluations show that most existing methods suffered from inflated false positives as well as biased co-expression estimates and clustering analysis, whereas CS-CORE gave accurate estimates in these experiments. When applied to scRNA-seq data from postmortem brain samples from Alzheimer’s disease patients/controls and blood samples from COVID-19 patients/controls, CS-CORE identified cell-type-specific co-expressions and differential co-expressions that were more reproducible and/or more enriched for relevant biological pathways than those inferred from existing methods.
more » « less- Award ID(s):
- 2329296
- PAR ID:
- 10440620
- Publisher / Repository:
- Nature Publishing Group
- Date Published:
- Journal Name:
- Nature Communications
- Volume:
- 14
- Issue:
- 1
- ISSN:
- 2041-1723
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Abstract Spatial barcoding-based transcriptomic (ST) data require deconvolution for cellular-level downstream analysis. Here we present SDePER, a hybrid machine learning and regression method to deconvolve ST data using reference single-cell RNA sequencing (scRNA-seq) data. SDePER tackles platform effects between ST and scRNA-seq data, ensuring a linear relationship between them while addressing sparsity and spatial correlations in cell types across capture spots. SDePER estimates cell-type proportions, enabling enhanced resolution tissue mapping by imputing cell-type compositions and gene expressions at unmeasured locations. Applications to simulated data and four real datasets showed SDePER’s superior accuracy and robustness over existing methods.
-
Cell–cell interactions (CCI) play significant roles in manipulating biological functions of cells. Analyzing the differences in CCI between healthy and diseased conditions of a biological system yields greater insight than analyzing either conditions alone. There has been a recent and rapid growth of methods to infer CCI from single-cell RNA-sequencing (scRNA-seq), revealing complex CCI networks at a previously inaccessible scale. However, the majority of current CCI analyses from scRNA-seq data focus on direct comparisons between individual CCI networks of individual samples from patients, rather than “group-level” comparisons between sample groups of patients comprising different conditions. To illustrate new biological features among different disease statuses, we investigated the diversity of key network features on groups of CCI networks, as defined by different disease statuses. We considered three levels of network features: node level, as defined by cell type; node-to-node level; and network level. By applying these analysis to a large-scale single-cell RNA-sequencing dataset of coronavirus disease 2019 (COVID-19), we observe biologically meaningful patterns aligned with the progression and subsequent convalescence of COVID-19.more » « less
-
When analyzing scRNA-seq data with clustering algorithms, annotating the clusters with cell types is an essential step toward biological interpretation of the data. Annotations can be performed manually using known cell type marker genes. Annotations can also be automated using knowledge-driven or data-driven machine learning algorithms. Majority of cell type annotation algorithms are designed to predict cell types for individual cells in a new dataset. Since biological interpretation of scRNA-seq data is often made on cell clusters rather than individual cells, several algorithms have been developed to annotate cell clusters. In this study, we compared five cell type annotation algorithms, Azimuth, SingleR, Garnett, scCATCH, and SCSA, which cover the spectrum of knowledge-driven and data-driven approaches to annotate either individual cells or cell clusters. We applied these five algorithms to two scRNA-seq datasets of peripheral blood mononuclear cells (PBMC) samples from COVID-19 patients and healthy controls, and evaluated their annotation performance. From this comparison, we observed that methods for annotating individual cells outperformed methods for annotation cell clusters. We applied the cell-based annotation algorithm Azimuth to the two scRNA-seq datasets to examine the immune response during COVID-19 infection. Both datasets presented significant depletion of plasmacytoid dendritic cells (pDCs), where differential expression in this cell type and pathway analysis revealed strong activation of type I interferon signaling pathway in response to the infection.more » « less
-
Abstract Numerous single‐cell transcriptomic datasets from identical tissues or cell lines are generated from different laboratories or single‐cell RNA sequencing (scRNA‐seq) protocols. The denoising of these datasets to eliminate batch effects is crucial for data integration, ensuring accurate interpretation and comprehensive analysis of biological questions. Although many scRNA‐seq data integration methods exist, most are inefficient and/or not conducive to downstream analysis. Here, DeepBID, a novel deep learning‐based method for batch effect correction, non‐linear dimensionality reduction, embedding, and cell clustering concurrently, is introduced. DeepBID utilizes a negative binomial‐based autoencoder with dual Kullback–Leibler divergence loss functions, aligning cell points from different batches within a consistent low‐dimensional latent space and progressively mitigating batch effects through iterative clustering. Extensive validation on multiple‐batch scRNA‐seq datasets demonstrates that DeepBID surpasses existing tools in removing batch effects and achieving superior clustering accuracy. When integrating multiple scRNA‐seq datasets from patients with Alzheimer's disease, DeepBID significantly improves cell clustering, effectively annotating unidentified cells, and detecting cell‐specific differentially expressed genes.