skip to main content

Title: Inferring spatial and signaling relationships between cells from single cell transcriptomic data

Single-cell RNA sequencing (scRNA-seq) provides details for individual cells; however, crucial spatial information is often lost. We present SpaOTsc, a method relying on structured optimal transport to recover spatial properties of scRNA-seq data by utilizing spatial measurements of a relatively small number of genes. A spatial metric for individual cells in scRNA-seq data is first established based on a map connecting it with the spatial measurements. The cell–cell communications are then obtained by “optimally transporting” signal senders to target signal receivers in space. Using partial information decomposition, we next compute the intercellular gene–gene information flow to estimate the spatial regulations between genes across cells. Four datasets are employed for cross-validation of spatial gene expression prediction and comparison to known cell–cell communications. SpaOTsc has broader applications, both in integrating non-spatial single-cell measurements with spatial data, and directly in spatial single-cell transcriptomics data to reconstruct spatial cellular dynamics in tissues.

Award ID(s):
Publication Date:
Journal Name:
Nature Communications
Nature Publishing Group
Sponsoring Org:
National Science Foundation
More Like this
  1. ABSTRACT: Motivation Single-cell RNA sequencing (scRNA-seq) captures whole transcriptome information of individual cells. While scRNA-seq measures thousands of genes, researchers are often interested in only dozens to hundreds of genes for a closer study. Then, a question is how to select those informative genes from scRNA-seq data. Moreover, single-cell targeted gene profiling technologies are gaining popularity for their low costs, high sensitivity and extra (e.g. spatial) information; however, they typically can only measure up to a few hundred genes. Then another challenging question is how to select genes for targeted gene profiling based on existing scRNA-seq data. Results Here, we develop the single-cell Projective Non-negative Matrix Factorization (scPNMF) method to select informative genes from scRNA-seq data in an unsupervised way. Compared with existing gene selection methods, scPNMF has two advantages. First, its selected informative genes can better distinguish cell types. Second, it enables the alignment of new targeted gene profiling data with reference data in a low-dimensional space to facilitate the prediction of cell types in the new data. Technically, scPNMF modifies the PNMF algorithm for gene selection by changing the initialization and adding a basis selection step, which selects informative bases to distinguish cell types. We demonstrate that scPNMFmore »outperforms the state-of-the-art gene selection methods on diverse scRNA-seq datasets. Moreover, we show that scPNMF can guide the design of targeted gene profiling experiments and the cell-type annotation on targeted gene profiling data. Availability and implementation The R package is open-access and available at The data used in this work are available at Zenodo: Supplementary information Supplementary data are available at Bioinformatics online.« less
  2. Multi-modal single cell RNA assays capture RNA content as well as other data modalities, such as spatial cell position or the electrophysiological properties of cells. Compared to dedicated scRNA-seq assays however, they may unintentionally capture RNA from multiple adjacent cells, exhibit lower RNA sequencing depth compared to scRNA-seq, or lack genome-wide RNA measurements. We present scProjection, a method for mapping individual multi-modal RNA measurements to deeply sequenced scRNA-seq atlases to extract cell type-specific, single cell gene expression profiles. We demonstrate several use cases of scProjection, including the identification of spatial motifs from spatial transcriptome assays, distinguishing RNA contributions from neighboring cells in both spatial and multi-modal single cell assays, and imputing expression measurements of un-measured genes from gene markers. scProjection therefore combines the advantages of both multi-modal and scRNA-seq assays to yield precise multi-modal measurements of single cells.
  3. Abstract Background

    Single-cell RNA-sequencing (scRNA-seq) technologies allow for the study of gene expression in individual cells. Often, it is of interest to understand how transcriptional activity is associated with cell-specific covariates, such as cell type, genotype, or measures of cell health. Traditional approaches for this type of association mapping assume independence between the outcome variables (or genes), and perform a separate regression for each. However, these methods are computationally costly and ignore the substantial correlation structure of gene expression. Furthermore, count-based scRNA-seq data pose challenges for traditional models based on Gaussian assumptions.


    We aim to resolve these issues by developing a reduced-rank regression model that identifies low-dimensional linear associations between a large number of cell-specific covariates and high-dimensional gene expression readouts. Our probabilistic model uses a Poisson likelihood in order to account for the unique structure of scRNA-seq counts. We demonstrate the performance of our model using simulations, and we apply our model to a scRNA-seq dataset, a spatial gene expression dataset, and a bulk RNA-seq dataset to show its behavior in three distinct analyses.


    We show that our statistical modeling approach, which is based on reduced-rank regression, captures associations between gene expression and cell- and sample-specific covariates by leveraging low-dimensionalmore »representations of transcriptional states.

    « less
  4. Abstract

    The development of single-cell RNA-sequencing (scRNA-seq) technologies has offered insights into complex biological systems at the single-cell resolution. In particular, these techniques facilitate the identifications of genes showing cell-type-specific differential expressions (DE). In this paper, we introduce MARBLES, a novel statistical model for cross-condition DE gene detection from scRNA-seq data. MARBLES employs a Markov Random Field model to borrow information across similar cell types and utilizes cell-type-specific pseudobulk count to account for sample-level variability. Our simulation results showed that MARBLES is more powerful than existing methods to detect DE genes with an appropriate control of false positive rate. Applications of MARBLES to real data identified novel disease-related DE genes and biological pathways from both a single-cell lipopolysaccharide mouse dataset with 24 381 cells and 11 076 genes and a Parkinson’s disease human data set with 76 212 cells and 15 891 genes. Overall, MARBLES is a powerful tool to identify cell-type-specific DE genes across conditions from scRNA-seq data.

  5. Single-cell RNA-sequencing (scRNA-seq) enables high throughput measurement of RNA expression in individual cells. Due to technical limitations, scRNA-seq data often contain zero counts for many transcripts in individual cells. These zero counts, or dropout events, complicate the analysis of scRNA-seq data using standard analysis methods developed for bulk RNA-seq data. Current scRNA-seq analysis methods typically overcome dropout by combining information across cells, leveraging the observation that cells generally occupy a small number of RNA expression states. We introduce netNMF-sc, an algorithm for scRNA-seq analysis that leverages information across both cells and genes. netNMF-sc combines network-regularized non-negative matrix factorization with a procedure for handling zero inflation in transcript count matrices. The matrix factorization results in a low-dimensional representation of the transcript count matrix, which imputes gene abundance for both zero and non-zero entries and can be used to cluster cells. The network regularization leverages prior knowledge of gene-gene interactions, encouraging pairs of genes with known interactions to be close in the low-dimensional representation. We show that netNMF-sc outperforms existing methods on simulated and real scRNA-seq data, with increasing advantage at higher dropout rates (e.g. above 60%). Furthermore, we show that the results from netNMF-sc -- including estimation of gene-gene covariance --more »are robust to choice of network, with more representative networks leading to greater performance gains.« less