When analyzing scRNA-seq data with clustering algorithms, annotating the clusters with cell types is an essential step toward biological interpretation of the data. Annotations can be performed manually using known cell type marker genes. Annotations can also be automated using knowledge-driven or data-driven machine learning algorithms. Majority of cell type annotation algorithms are designed to predict cell types for individual cells in a new dataset. Since biological interpretation of scRNA-seq data is often made on cell clusters rather than individual cells, several algorithms have been developed to annotate cell clusters. In this study, we compared five cell type annotation algorithms, Azimuth, SingleR, Garnett, scCATCH, and SCSA, which cover the spectrum of knowledge-driven and data-driven approaches to annotate either individual cells or cell clusters. We applied these five algorithms to two scRNA-seq datasets of peripheral blood mononuclear cells (PBMC) samples from COVID-19 patients and healthy controls, and evaluated their annotation performance. From this comparison, we observed that methods for annotating individual cells outperformed methods for annotation cell clusters. We applied the cell-based annotation algorithm Azimuth to the two scRNA-seq datasets to examine the immune response during COVID-19 infection. Both datasets presented significant depletion of plasmacytoid dendritic cells (pDCs), where differential expression in this cell type and pathway analysis revealed strong activation of type I interferon signaling pathway in response to the infection.
more »
« less
Exploring the Unknown: How Can We Improve Single-cell RNAseq Cell Type Annotations in Non-model Organisms?
Synopsis Single-cell RNA sequencing (scRNAseq) is a powerful tool to describe cell types in multicellular organisms across the animal kingdom. In standard scRNAseq analysis pipelines, clusters of cells with similar transcriptional signatures are given cell type labels based on marker genes that infer specialized known characteristics. Since these analyses are designed for model organisms, such as humans and mice, problems arise when attempting to label cell types of distantly related, non-model species that have unique or divergent cell types. Consequently, this leads to limited discovery of novel species-specific cell types and potential mis-annotation of cell types in non-model species while using scRNAseq. To address this problem, we discuss recently published approaches that help annotate scRNAseq clusters for any non-model organism. We first suggest that annotating with an evolutionary context of cell lineages will aid in the discovery of novel cell types and provide a marker-free approach to compare cell types across distantly related species. Secondly, machine learning has greatly improved bioinformatic analyses, so we highlight some open-source programs that use reference-free approaches to annotate cell clusters. Lastly, we propose the use of unannotated genes as potential cell markers for non-model organisms, as many do not have fully annotated genomes and these data are often disregarded. Improving single-cell annotations will aid the discovery of novel cell types and enhance our understanding of non-model organisms at a cellular level. By unifying approaches to annotate cell types in non-model organisms, we can increase the confidence of cell annotation label transfer and the flexibility to discover novel cell types.
more »
« less
- Award ID(s):
- 2128071
- PAR ID:
- 10556181
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- Integrative And Comparative Biology
- Volume:
- 64
- Issue:
- 5
- ISSN:
- 1540-7063
- Format(s):
- Medium: X Size: p. 1291-1299
- Size(s):
- p. 1291-1299
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Single-cell RNA sequencing (scRNAseq) is rapidly advancing our understanding of cellular composition within complex tissues and organisms. A major limitation in most scRNAseq analysis pipelines is the reliance on manual annotations to determine cell identities, which are time consuming, subjective, and require expertise. Given the surge in cell sequencing, supervised methods–especially deep learning models–have been developed for automatic cell type identification (ACTI), which achieve high accuracy and scalability. However, all existing deep learning frameworks for ACTI lack interpretability and are used as “black-box” models. We present N-ACT (Neural-Attention for Cell Type identification): the first-of-its-kind interpretable deep neural network for ACTI utilizing neural attention to detect salient genes for use in cell-types identification. We compare N-ACT to conventional annotation methods on two previously manually annotated data sets, demonstrating that N-ACT accurately identifies marker genes and cell types in an unsupervised manner, while performing comparably on multiple data sets to current state-of-the-art model in traditional supervised ACTI.more » « less
-
Abstract Determining the repertoire of a microbe's molecular functions is a central question in microbial biology. Modern techniques achieve this goal by comparing microbial genetic material against reference databases of functionally annotated genes/proteins or known taxonomic markers such as 16S rRNA. Here, we describe a novel approach to exploring bacterial functional repertoires without reference databases. Our Fusion scheme establishes functional relationships between bacteria and assigns organisms to Fusion-taxa that differ from otherwise defined taxonomic clades. Three key findings of our work stand out. First, bacterial functional comparisons outperform marker genes in assigning taxonomic clades. Fusion profiles are also better for this task than other functional annotation schemes. Second, Fusion-taxa are robust to addition of novel organisms and are, arguably, able to capture the environment-driven bacterial diversity. Finally, our alignment-free nucleic acid-based Siamese Neural Network model, created using Fusion functions, enables finding shared functionality of very distant, possibly structurally different, microbial homologs. Our work can thus help annotate functional repertoires of bacterial organisms and further guide our understanding of microbial communities.more » « less
-
null (Ed.)With the advent of single-cell RNA sequencing (scRNA-seq) technologies, there has been a spike in stud-ies involving scRNA-seq of several tissues across diverse species includingDrosophila. Although a fewdatabases exist for users to query genes of interest within the scRNA-seq studies, search tools that enableusers to find orthologous genes and their cell type-specific expression patterns across species are limited.Here, we built a new search database, DRscDB (https://www.flyrnai.org/tools/single_cell/web/), toaddress this need. DRscDB serves as a comprehensive repository for published scRNA-seq datasets forDrosophilaand relevant datasets from human and other model organisms. DRscDB is based on manualcuration ofDrosophilascRNA-seq studies of various tissue types and their corresponding analogoustissues in vertebrates including zebrafish, mouse, and human. Of note, our search database provides mostof the literature-derived marker genes, thus preserving the original analysis of the published scRNA-seqdatasets. Finally, DRscDB serves as a web-based user interface that allows users to mine gene expressiondata from scRNA-seq studies and perform cell cluster enrichment analyses pertaining to variousscRNA-seq studies, both within and across species.more » « less
-
Abstract Adult pluripotent stem cells are found in diverse animals, including cnidarians, acoels, and planarians, and confer remarkable abilities such as whole-body regeneration. The mechanisms by which these pluripotent stem cells orchestrate the replacement of all lost cell types, however, remains poorly understood. Underlying heterogeneity within the stem cell populations of these animals is often obscured when focusing on certain tissue types or life history stages, which tend to have indistinguishable spatial expression patterns of stem cell marker genes. Here, we focus on the adult pluripotent stem cells (i-cells) ofHydractinia symbiolongicarpus, a colonial marine cnidarian with distinct polyp types and stolonal tissue. Recently, a single-cell expression atlas was generated forH. symbiolongicarpuswhich revealed two distinct clusters with i-cell signatures, potentially representing heterogeneity within this species’ stem cell population. Considering this finding, we investigated eight new putative stem cell marker genes from the atlas including five expressed in both i-cell clusters (Pcna,Nop58,Mcm4,Ubr7, andUhrf1) and three expressed in one cluster or the other (Pter, FoxQ2-like,andZcwpw1). We characterized their expression patterns in various contexts–feeding and sexual polyps, juvenile feeding polyps, stolon, and during feeding polyp head regeneration–revealing context-dependent gene expression patterns and a transcriptionally dynamic i-cell population. We uncover previously unknown differences within the i-cell population ofHydractiniaand demonstrate that its colonial nature serves as an excellent system for investigating and visualizing heterogeneity in pluripotent stem cells.more » « less