This content will become publicly available on October 5, 2023
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Genome Research
- Sponsoring Org:
- National Science Foundation
More Like this
When analyzing scRNA-seq data with clustering algorithms, annotating the clusters with cell types is an essential step toward biological interpretation of the data. Annotations can be performed manually using known cell type marker genes. Annotations can also be automated using knowledge-driven or data-driven machine learning algorithms. Majority of cell type annotation algorithms are designed to predict cell types for individual cells in a new dataset. Since biological interpretation of scRNA-seq data is often made on cell clusters rather than individual cells, several algorithms have been developed to annotate cell clusters. In this study, we compared five cell type annotation algorithms, Azimuth, SingleR, Garnett, scCATCH, and SCSA, which cover the spectrum of knowledge-driven and data-driven approaches to annotate either individual cells or cell clusters. We applied these five algorithms to two scRNA-seq datasets of peripheral blood mononuclear cells (PBMC) samples from COVID-19 patients and healthy controls, and evaluated their annotation performance. From this comparison, we observed that methods for annotating individual cells outperformed methods for annotation cell clusters. We applied the cell-based annotation algorithm Azimuth to the two scRNA-seq datasets to examine the immune response during COVID-19 infection. Both datasets presented significant depletion of plasmacytoid dendritic cells (pDCs), where differential expressionmore »
null (Ed.)Single cell RNA-sequencing (scRNA-seq) technology enables comprehensive transcriptomic profiling of thousands of cells with distinct phenotypic and physiological states in a complex tissue. Substantial efforts have been made to characterize single cells of distinct identities from scRNA-seq data, including various cell clustering techniques. While existing approaches can handle single cells in terms of different cell (sub)types at a high resolution, identification of the functional variability within the same cell type remains unsolved. In addition, there is a lack of robust method to handle the inter-subject variation that often brings severe confounding effects for the functional clustering of single cells. In this study, we developed a novel data denoising and cell clustering approach, namely CIBS, to provide biologically explainable functional classification for scRNA-seq data. CIBS is based on a systems biology model of transcriptional regulation that assumes a multi-modality distribution of the cells’ activation status, and it utilizes a Boolean matrix factorization approach on the discretized expression status to robustly derive functional modules. CIBS is empowered by a novel fast Boolean Matrix Factorization method, namely PFAST, to increase the computational feasibility on large scale scRNA-seq data. Application of CIBS on two scRNA-seq datasets collected from cancer tumor micro-environment successfully identified subgroupsmore »
Mathelier, Anthony (Ed.)Abstract Motivation Recent breakthroughs of single-cell RNA sequencing (scRNA-seq) technologies offer an exciting opportunity to identify heterogeneous cell types in complex tissues. However, the unavoidable biological noise and technical artifacts in scRNA-seq data as well as the high dimensionality of expression vectors make the problem highly challenging. Consequently, although numerous tools have been developed, their accuracy remains to be improved. Results Here, we introduce a novel clustering algorithm and tool RCSL (Rank Constrained Similarity Learning) to accurately identify various cell types using scRNA-seq data from a complex tissue. RCSL considers both local similarity and global similarity among the cells to discern the subtle differences among cells of the same type as well as larger differences among cells of different types. RCSL uses Spearman’s rank correlations of a cell’s expression vector with those of other cells to measure its global similarity, and adaptively learns neighbor representation of a cell as its local similarity. The overall similarity of a cell to other cells is a linear combination of its global similarity and local similarity. RCSL automatically estimates the number of cell types defined in the similarity matrix, and identifies them by constructing a block-diagonal matrix, such that its distance to the similaritymore »
iDESC: identifying differential expression in single-cell RNA sequencing data with multiple subjects
Single-cell RNA sequencing (scRNA-seq) technology has enabled assessment of transcriptome-wide changes at single-cell resolution. Due to the heterogeneity in environmental exposure and genetic background across subjects, subject effect contributes to the major source of variation in scRNA-seq data with multiple subjects, which severely confounds cell type specific differential expression (DE) analysis. Moreover, dropout events are prevalent in scRNA-seq data, leading to excessive number of zeroes in the data, which further aggravates the challenge in DE analysis.
We developed iDESC to detect cell type specific DE genes between two groups of subjects in scRNA-seq data. iDESC uses a zero-inflated negative binomial mixed model to consider both subject effect and dropouts. The prevalence of dropout events (dropout rate) was demonstrated to be dependent on gene expression level, which is modeled by pooling information across genes. Subject effect is modeled as a random effect in the log-mean of the negative binomial component. We evaluated and compared the performance of iDESC with eleven existing DE analysis methods. Using simulated data, we demonstrated that iDESC had well-controlled type I error and higher power compared to the existing methods. Applications of those methods with well-controlled type I error to three real scRNA-seq datasets from themore »
iDESC was able to achieve more accurate and robust DE analysis results by separating subject effect from disease effect with consideration of dropouts to identify DE genes, suggesting the importance of considering subject effect and dropouts in the DE analysis of scRNA-seq data with multiple subjects.
Single-cell RNA-sequencing (scRNA-seq) technologies allow for the study of gene expression in individual cells. Often, it is of interest to understand how transcriptional activity is associated with cell-specific covariates, such as cell type, genotype, or measures of cell health. Traditional approaches for this type of association mapping assume independence between the outcome variables (or genes), and perform a separate regression for each. However, these methods are computationally costly and ignore the substantial correlation structure of gene expression. Furthermore, count-based scRNA-seq data pose challenges for traditional models based on Gaussian assumptions.
We aim to resolve these issues by developing a reduced-rank regression model that identifies low-dimensional linear associations between a large number of cell-specific covariates and high-dimensional gene expression readouts. Our probabilistic model uses a Poisson likelihood in order to account for the unique structure of scRNA-seq counts. We demonstrate the performance of our model using simulations, and we apply our model to a scRNA-seq dataset, a spatial gene expression dataset, and a bulk RNA-seq dataset to show its behavior in three distinct analyses.
We show that our statistical modeling approach, which is based on reduced-rank regression, captures associations between gene expression and cell- and sample-specific covariates by leveraging low-dimensionalmore »