Single-cell technologies characterize complex cell populations across multiple data modalities at unprecedented scale and resolution. Multi-omic data for single cell gene expression, in situ hybridization, or single cell chromatin states are increasingly available across diverse tissue types. When isolating specific cell types from a sample of disassociated cells or performing in situ sequencing in collections of heterogeneous cells, one challenging task is to select a small set of informative markers that robustly enable the identification and discrimination of specific cell types or cell states as precisely as possible. Given single cell RNA-seq data and a set of cellular labels to discriminate, scGeneFit selects gene markers that jointly optimize cell label recovery using label-aware compressive classification methods. This results in a substantially more robust and less redundant set of markers than existing methods, most of which identify markers that separate each cell label from the rest. When applied to a data set given a hierarchy of cell types as labels, the markers found by our method improves the recovery of the cell type hierarchy with fewer markers than existing methods using a computationally efficient and principled optimization.
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Nature Communications
- Nature Publishing Group
- Sponsoring Org:
- National Science Foundation
More Like this
Obeid, I. (Ed.)The Neural Engineering Data Consortium (NEDC) is developing the Temple University Digital Pathology Corpus (TUDP), an open source database of high-resolution images from scanned pathology samples , as part of its National Science Foundation-funded Major Research Instrumentation grant titled “MRI: High Performance Digital Pathology Using Big Data and Machine Learning” . The long-term goal of this project is to release one million images. We have currently scanned over 100,000 images and are in the process of annotating breast tissue data for our first official corpus release, v1.0.0. This release contains 3,505 annotated images of breast tissue including 74 patients with cancerous diagnoses (out of a total of 296 patients). In this poster, we will present an analysis of this corpus and discuss the challenges we have faced in efficiently producing high quality annotations of breast tissue. It is well known that state of the art algorithms in machine learning require vast amounts of data. Fields such as speech recognition , image recognition  and text processing  are able to deliver impressive performance with complex deep learning models because they have developed large corpora to support training of extremely high-dimensional models (e.g., billions of parameters). Other fields that do notmore »
A Markov random field model-based approach for differentially expressed gene detection from single-cell RNA-seq data
The development of single-cell RNA-sequencing (scRNA-seq) technologies has offered insights into complex biological systems at the single-cell resolution. In particular, these techniques facilitate the identifications of genes showing cell-type-specific differential expressions (DE). In this paper, we introduce MARBLES, a novel statistical model for cross-condition DE gene detection from scRNA-seq data. MARBLES employs a Markov Random Field model to borrow information across similar cell types and utilizes cell-type-specific pseudobulk count to account for sample-level variability. Our simulation results showed that MARBLES is more powerful than existing methods to detect DE genes with an appropriate control of false positive rate. Applications of MARBLES to real data identified novel disease-related DE genes and biological pathways from both a single-cell lipopolysaccharide mouse dataset with 24 381 cells and 11 076 genes and a Parkinson’s disease human data set with 76 212 cells and 15 891 genes. Overall, MARBLES is a powerful tool to identify cell-type-specific DE genes across conditions from scRNA-seq data.
Accurate estimation of cell composition in bulk expression through robust integration of single-cell information
We present Bisque, a tool for estimating cell type proportions in bulk expression. Bisque implements a regression-based approach that utilizes single-cell RNA-seq (scRNA-seq) or single-nucleus RNA-seq (snRNA-seq) data to generate a reference expression profile and learn gene-specific bulk expression transformations to robustly decompose RNA-seq data. These transformations significantly improve decomposition performance compared to existing methods when there is significant technical variation in the generation of the reference profile and observed bulk expression. Importantly, compared to existing methods, our approach is extremely efficient, making it suitable for the analysis of large genomic datasets that are becoming ubiquitous. When applied to subcutaneous adipose and dorsolateral prefrontal cortex expression datasets with both bulk RNA-seq and snRNA-seq data, Bisque replicates previously reported associations between cell type proportions and measured phenotypes across abundant and rare cell types. We further propose an additional mode of operation that merely requires a set of known marker genes.
Single-cell RNA transcriptome analysis of CNS immune cells reveals CXCL16/CXCR6 as maintenance factors for tissue-resident T cells that drive synapse elimination
Emerging RNA viruses that target the central nervous system (CNS) lead to cognitive sequelae in survivors. Studies in humans and mice infected with West Nile virus (WNV), a re-emerging RNA virus associated with learning and memory deficits, revealed microglial-mediated synapse elimination within the hippocampus. Moreover, CNS-resident memory T (TRM) cells activate microglia, limiting synapse recovery and inducing spatial learning defects in WNV-recovered mice. The signals involved in T cell-microglia interactions are unknown.
Here, we examined immune cells within the murine WNV-recovered forebrain using single-cell RNA sequencing to identify putative ligand-receptor pairs involved in intercellular communication between T cells and microglia. Clustering and differential gene analyses were followed by protein validation and genetic and antibody-based approaches utilizing an established murine model of WNV recovery in which microglia and complement promote ongoing hippocampal synaptic loss.
Profiling of host transcriptome immune cells at 25 days post-infection in mice revealed a shift in forebrain homeostatic microglia to activated subpopulations with transcriptional signatures that have previously been observed in studies of neurodegenerative diseases. Importantly, CXCL16/CXCR6, a chemokine signaling pathway involved in TRM cell biology, was identified as critically regulating CXCR6 expressing CD8+TRM cell numbers within the WNV-recovered forebrain. We demonstrate that CXCL16 is highlymore »
We provide a comprehensive assessment of the role of CXCL16/CXCR6 as an interaction link between microglia and CD8+T cells that maintains forebrain TRM cells, microglial and astrocyte activation, and ongoing synapse elimination in virally recovered animals. We also show that therapeutic targeting of CXCL16 in mice during recovery may reduce CNS CD8+TRM cells.
Comparisons of cell proliferation and cell death from tornaria larva to juvenile worm in the hemichordate Schizocardium californicum
There are a wide range of developmental strategies in animal phyla, but most insights into adult body plan formation come from direct-developing species. For indirect-developing species, there are distinct larval and adult body plans that are linked together by metamorphosis. Some outstanding questions in the development of indirect-developing organisms include the extent to which larval tissue undergoes cell death during the process of metamorphosis and when and where the tissue that will give rise to the adult originates. How do the processes of cell division and cell death redesign the body plans of indirect developers? In this study, we present patterns of cell proliferation and cell death during larval body plan development, metamorphosis, and adult body plan formation, in the hemichordate
Schizocardium californium(Cameron and Perez in Zootaxa 3569:79–88, 2012) to answer these questions. Results
We identified distinct patterns of cell proliferation between larval and adult body plan formation of
S. californicum. We found that some adult tissues proliferate during the late larval phase prior to the start of overt metamorphosis. In addition, using an irradiation and transcriptomic approach, we describe a genetic signature of proliferative cells that is shared across the life history states, as well as markers that are unique tomore » Conclusions
Cell proliferation during the development of
S. californicumhas distinct patterns in the formation of larval and adult body plans. However, cell death is very limited in larvae and begins during the onset of metamorphosis and into early juvenile development in specific domains. The populations of cells that proliferated and gave rise to the larvae and juveniles have a genetic signature that suggested a heterogeneous pool of proliferative progenitors, rather than a set-aside population of pluripotent cells. Taken together, we propose that the gradual morphological transformation of S. californicumis mirrored at the cellular level and may be more representative of the development strategies that characterize metamorphosis in many metazoan animals.