skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Deconvolution analysis of cell‐type expression from bulk tissues by integrating with single‐cell expression reference
Abstract To understand phenotypic variations and key factors which affect disease susceptibility of complex traits, it is important to decipher cell‐type tissue compositions. To study cellular compositions of bulk tissue samples, one can evaluate cellular abundances and cell‐type‐specific gene expression patterns from the tissue transcriptome profiles. We develop both fixed and mixed models to reconstruct cellular expression fractions for bulk‐profiled samples by using reference single‐cell (sc) RNA‐sequencing (RNA‐seq) reference data. In benchmark evaluations of estimating cellular expression fractions, the mixed‐effect models provide similar results as an elegant machine learning algorithm named cell‐type identification by estimating relative subsets of RNA transcripts (CIBERSORTx), which is a well‐known and reliable procedure to reconstruct cell‐type abundances and cell‐type‐specific gene expression profiles. In real data analysis, the mixed‐effect models outperform or perform similarly as CIBERSORTx. The mixed models perform better than the fixed models in both benchmark evaluations and data analysis. In simulation studies, we show that if the heterogeneity exists in scRNA‐seq data, it is better to use mixed models with heterogeneous mean and variance–covariance. As a byproduct, the mixed models provide fractions of covariance between subject‐specific gene expression and cell types to measure their correlations. The proposed mixed models provide a complementary tool to dissect bulk tissues using scRNA‐seq data.  more » « less
Award ID(s):
1915904
PAR ID:
10380261
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Genetic Epidemiology
Volume:
46
Issue:
8
ISSN:
0741-0395
Page Range / eLocation ID:
p. 615-628
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract We present Bisque, a tool for estimating cell type proportions in bulk expression. Bisque implements a regression-based approach that utilizes single-cell RNA-seq (scRNA-seq) or single-nucleus RNA-seq (snRNA-seq) data to generate a reference expression profile and learn gene-specific bulk expression transformations to robustly decompose RNA-seq data. These transformations significantly improve decomposition performance compared to existing methods when there is significant technical variation in the generation of the reference profile and observed bulk expression. Importantly, compared to existing methods, our approach is extremely efficient, making it suitable for the analysis of large genomic datasets that are becoming ubiquitous. When applied to subcutaneous adipose and dorsolateral prefrontal cortex expression datasets with both bulk RNA-seq and snRNA-seq data, Bisque replicates previously reported associations between cell type proportions and measured phenotypes across abundant and rare cell types. We further propose an additional mode of operation that merely requires a set of known marker genes. 
    more » « less
  2. Abstract The advancement of single cell RNA-sequencing (scRNA-seq) technology has enabled the direct inference of co-expressions in specific cell types, facilitating our understanding of cell-type-specific biological functions. For this task, the high sequencing depth variations and measurement errors in scRNA-seq data present two significant challenges, and they have not been adequately addressed by existing methods. We propose a statistical approach, CS-CORE, for estimating and testing cell-type-specific co-expressions, that explicitly models sequencing depth variations and measurement errors in scRNA-seq data. Systematic evaluations show that most existing methods suffered from inflated false positives as well as biased co-expression estimates and clustering analysis, whereas CS-CORE gave accurate estimates in these experiments. When applied to scRNA-seq data from postmortem brain samples from Alzheimer’s disease patients/controls and blood samples from COVID-19 patients/controls, CS-CORE identified cell-type-specific co-expressions and differential co-expressions that were more reproducible and/or more enriched for relevant biological pathways than those inferred from existing methods. 
    more » « less
  3. Single-cell RNA-sequencing (scRNA-seq) enables high throughput measurement of RNA expression in individual cells. Due to technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells. These zero counts, or dropout events, complicate the analysis of scRNA-seq data using standard analysis methods developed for bulk RNA-seq data. Current scRNA-seq analysis methods typically overcome dropout by combining information across cells, leveraging the observation that cells generally occupy a small number of RNA expression states. We introduce netNMF-sc, an algorithm for scRNA-seq analysis that leverages information across both cells and genes. netNMF-sc combines network-regularized non-negative matrix factorization with a procedure for handling zero inflation in transcript count matrices. The matrix factorization results in a low-dimensional representation of the transcript count matrix, which imputes gene abundance for both zero and non-zero entries and can be used to cluster cells. The network regularization leverages prior knowledge of gene-gene interactions, encouraging pairs of genes with known interactions to be close in the low-dimensional representation. We show that netNMF-sc outperforms existing methods on simulated and real scRNA-seq data, with increasing advantage at higher dropout rates (e.g. above 60%). Furthermore, we show that the results from netNMF-sc -- including estimation of gene-gene covariance -- are robust to choice of network, with more representative networks leading to greater performance gains. 
    more » « less
  4. Abstract The performance of computational methods and software to identify differentially expressed features in single‐cell RNA‐sequencing (scRNA‐seq) has been shown to be influenced by several factors, including the choice of the normalization method used and the choice of the experimental platform (or library preparation protocol) to profile gene expression in individual cells. Currently, it is up to the practitioner to choose the most appropriate differential expression (DE) method out of over 100 DE tools available to date, each relying on their own assumptions to model scRNA‐seq expression features. To model the technological variability in cross‐platform scRNA‐seq data, here we propose to use Tweedie generalized linear models that can flexibly capture a large dynamic range of observed scRNA‐seq expression profiles across experimental platforms induced by platform‐ and gene‐specific statistical properties such as heavy tails, sparsity, and gene expression distributions. We also propose a zero‐inflated Tweedie model that allows zero probability mass to exceed a traditional Tweedie distribution to model zero‐inflated scRNA‐seq data with excessive zero counts. Using both synthetic and published plate‐ and droplet‐based scRNA‐seq datasets, we perform a systematic benchmark evaluation of more than 10 representative DE methods and demonstrate that our method (Tweedieverse) outperforms the state‐of‐the‐art DE approaches across experimental platforms in terms of statistical power and false discovery rate control. Our open‐source software (R/Bioconductor package) is available athttps://github.com/himelmallick/Tweedieverse. 
    more » « less
  5. Multi-modal single cell RNA assays capture RNA content as well as other data modalities, such as spatial cell position or the electrophysiological properties of cells. Compared to dedicated scRNA-seq assays however, they may unintentionally capture RNA from multiple adjacent cells, exhibit lower RNA sequencing depth compared to scRNA-seq, or lack genome-wide RNA measurements. We present scProjection, a method for mapping individual multi-modal RNA measurements to deeply sequenced scRNA-seq atlases to extract cell type-specific, single cell gene expression profiles. We demonstrate several use cases of scProjection, including the identification of spatial motifs from spatial transcriptome assays, distinguishing RNA contributions from neighboring cells in both spatial and multi-modal single cell assays, and imputing expression measurements of un-measured genes from gene markers. scProjection therefore combines the advantages of both multi-modal and scRNA-seq assays to yield precise multi-modal measurements of single cells. 
    more » « less