skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: DWEN: A novel method for accurate estimation of cell type compositions from bulk data samples
Advances in single-cell RNA sequencing (scRNAseq) technologies have allowed us to study the heterogeneity of cell populations. The cell compositions of tissues from different hosts may vary greatly, indicating the condition of the hosts, from which the samples are collected. However, the high sequencing cost and the lack of fresh tissues make single-cell approaches less appealing. In many cases, it is practically impossible to generate single-cell data in a large number of subjects, making it challenging to monitor changes in cell type compositions in various diseases. Here we introduce a novel approach, named Deconvolution using Weighted Elastic Net (DWEN), that allows researchers to accurately estimate the cell type compositions from bulk data samples without the need of generating single-cell data. It also allows for the re-analysis of bulk data collected from rare conditions to extract more in-depth cell-type level insights. The approach consists of two modules. The first module constructs the cell type signature matrix from single-cell data while the second module estimates the cell type compositions of input bulk samples. In an extensive analysis using 20 datasets generated from scRNA-seq data of different human tissues, we demonstrate that DWEN outperforms current state-of-the-arts in estimating cell type compositions of bulk samples.  more » « less
Award ID(s):
2203236 2141660 2001385
PAR ID:
10409686
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2022 14th International Conference on Knowledge and Systems Engineering (KSE)
Page Range / eLocation ID:
1 to 6
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract To understand phenotypic variations and key factors which affect disease susceptibility of complex traits, it is important to decipher cell‐type tissue compositions. To study cellular compositions of bulk tissue samples, one can evaluate cellular abundances and cell‐type‐specific gene expression patterns from the tissue transcriptome profiles. We develop both fixed and mixed models to reconstruct cellular expression fractions for bulk‐profiled samples by using reference single‐cell (sc) RNA‐sequencing (RNA‐seq) reference data. In benchmark evaluations of estimating cellular expression fractions, the mixed‐effect models provide similar results as an elegant machine learning algorithm named cell‐type identification by estimating relative subsets of RNA transcripts (CIBERSORTx), which is a well‐known and reliable procedure to reconstruct cell‐type abundances and cell‐type‐specific gene expression profiles. In real data analysis, the mixed‐effect models outperform or perform similarly as CIBERSORTx. The mixed models perform better than the fixed models in both benchmark evaluations and data analysis. In simulation studies, we show that if the heterogeneity exists in scRNA‐seq data, it is better to use mixed models with heterogeneous mean and variance–covariance. As a byproduct, the mixed models provide fractions of covariance between subject‐specific gene expression and cell types to measure their correlations. The proposed mixed models provide a complementary tool to dissect bulk tissues using scRNA‐seq data. 
    more » « less
  2. Abstract Spatially resolved gene expression profiling provides insight into tissue organization and cell–cell crosstalk; however, sequencing-based spatial transcriptomics (ST) lacks single-cell resolution. Current ST analysis methods require single-cell RNA sequencing data as a reference for rigorous interpretation of cell states, mostly do not use associated histology images and are not capable of inferring shared neighborhoods across multiple tissues. Here we present Starfysh, a computational toolbox using a deep generative model that incorporates archetypal analysis and any known cell type markers to characterize known or new tissue-specific cell states without a single-cell reference. Starfysh improves the characterization of spatial dynamics in complex tissues using histology images and enables the comparison of niches as spatial hubs across tissues. Integrative analysis of primary estrogen receptor (ER)-positive breast cancer, triple-negative breast cancer (TNBC) and metaplastic breast cancer (MBC) tissues led to the identification of spatial hubs with patient- and disease-specific cell type compositions and revealed metabolic reprogramming shaping immunosuppressive hubs in aggressive MBC. 
    more » « less
  3. Abstract Single-cell RNA sequencing (scRNA-Seq) is a recent technology that allows for the measurement of the expression of all genes in each individual cell contained in a sample. Information at the single-cell level has been shown to be extremely useful in many areas. However, performing single-cell experiments is expensive. Although cellular deconvolution cannot provide the same comprehensive information as single-cell experiments, it can extract cell-type information from bulk RNA data, and therefore it allows researchers to conduct studies at cell-type resolution from existing bulk datasets. For these reasons, a great effort has been made to develop such methods for cellular deconvolution. The large number of methods available, the requirement of coding skills, inadequate documentation, and lack of performance assessment all make it extremely difficult for life scientists to choose a suitable method for their experiment. This paper aims to fill this gap by providing a comprehensive review of 53 deconvolution methods regarding their methodology, applications, performance, and outstanding challenges. More importantly, the article presents a benchmarking of all these 53 methods using 283 cell types from 30 tissues of 63 individuals. We also provide an R package named DeconBenchmark that allows readers to execute and benchmark the reviewed methods (https://github.com/tinnlab/DeconBenchmark). 
    more » « less
  4. Abstract BackgroundLow back pain is a leading cause of disability worldwide and is frequently attributed to intervertebral disc (IVD) degeneration. Though the contributions of the adjacent cartilage endplates (CEP) to IVD degeneration are well documented, the phenotype and functions of the resident CEP cells are critically understudied. To better characterize CEP cell phenotype and possible mechanisms of CEP degeneration, bulk and single-cell RNA sequencing of non-degenerated and degenerated CEP cells were performed. MethodsHuman lumbar CEP cells from degenerated (Thompson grade ≥ 4) and non-degenerated (Thompson grade ≤ 2) discs were expanded for bulk (N=4 non-degenerated,N=4 degenerated) and single-cell (N=1 non-degenerated,N=1 degenerated) RNA sequencing. Genes identified from bulk RNA sequencing were categorized by function and their expression in non-degenerated and degenerated CEP cells were compared. A PubMed literature review was also performed to determine which genes were previously identified and studied in the CEP, IVD, and other cartilaginous tissues. For single-cell RNA sequencing, different cell clusters were resolved using unsupervised clustering and functional annotation. Differential gene expression analysis and Gene Ontology, respectively, were used to compare gene expression and functional enrichment between cell clusters, as well as between non-degenerated and degenerated CEP samples. ResultsBulk RNA sequencing revealed 38 genes were significantly upregulated and 15 genes were significantly downregulated in degenerated CEP cells relative to non-degenerated cells (|fold change| ≥ 1.5). Of these, only 2 genes were previously studied in CEP cells, and 31 were previously studied in the IVD and other cartilaginous tissues. Single-cell RNA sequencing revealed 11 unique cell clusters, including multiple chondrocyte and progenitor subpopulations with distinct gene expression and functional profiles. Analysis of genes in the bulk RNA sequencing dataset showed that progenitor cell clusters from both samples were enriched in “non-degenerated” genes but not “degenerated” genes. For both bulk- and single-cell analyses, gene expression and pathway enrichment analyses highlighted several pathways that may regulate CEP degeneration, including transcriptional regulation, translational regulation, intracellular transport, and mitochondrial dysfunction. ConclusionsThis thorough analysis using RNA sequencing methods highlighted numerous differences between non-degenerated and degenerated CEP cells, the phenotypic heterogeneity of CEP cells, and several pathways of interest that may be relevant in CEP degeneration. 
    more » « less
  5. Abstract As the circadian clock regulates fundamental biological processes, disrupted clocks are often observed in patients and diseased tissues. Determining the circadian time of the patient or the tissue of focus is essential in circadian medicine and research. Here we present tauFisher, a computational pipeline that accurately predicts circadian time from a single transcriptomic sample by finding correlations between rhythmic genes within the sample. We demonstrate tauFisher’s performance in adding timestamps to both bulk and single-cell transcriptomic samples collected from multiple tissue types and experimental settings. Application of tauFisher at a cell-type level in a single-cell RNAseq dataset collected from mouse dermal skin implies that greater circadian phase heterogeneity may explain the dampened rhythm of collective core clock gene expression in dermal immune cells compared to dermal fibroblasts. Given its robustness and generalizability across assay platforms, experimental setups, and tissue types, as well as its potential application in single-cell RNAseq data analysis, tauFisher is a promising tool that facilitates circadian medicine and research. 
    more » « less