A prominent trend in single-cell transcriptomics is providing spatial context alongside a characterization of each cell’s molecular state. This typically requires targeting an a priori selection of genes, often covering less than 1% of the genome, and a key question is how to optimally determine the small gene panel. We address this challenge by introducing a flexible deep learning framework, PERSIST, to identify informative gene targets for spatial transcriptomics studies by leveraging reference scRNA-seq data. Using datasets spanning different brain regions, species, and scRNA-seq technologies, we show that PERSIST reliably identifies panels that provide more accurate prediction of the genome-wide expression profile, thereby capturing more information with fewer genes. PERSIST can be adapted to specific biological goals, and we demonstrate that PERSIST’s binarization of gene expression levels enables models trained on scRNA-seq data to generalize with to spatial transcriptomics data, despite the complex shift between these technologies.
In recent years, the explosive growth of spatial technologies has enabled the characterization of spatial heterogeneity of tissue architectures. Compared to traditional sequencing, spatial transcriptomics reserves the spatial information of each captured location and provides novel insights into diverse spatially related biological contexts. Even though two spatial transcriptomics databases exist, they provide limited analytical information. Information such as spatial heterogeneity of genes and cells, cell-cell communication activities in space, and the cell type compositions in the microenvironment are critical clues to unveil the mechanism of tumorigenesis and embryo differentiation. Therefore, we constructed a new spatial transcriptomics database, named SPASCER (https://ccsm.uth.edu/SPASCER), designed to help understand the heterogeneity of tissue organizations, region-specific microenvironment, and intercellular interactions across tissue architectures at multiple levels. SPASCER contains datasets from 43 studies, including 1082 sub-datasets from 16 organ types across four species. scRNA-seq was integrated to deconvolve/map spatial transcriptomics, and processed with spatial cell-cell interaction, gene pattern and pathway enrichment analysis. Cell–cell interactions and gene regulation network of scRNA-seq from matched spatial transcriptomics were performed as well. The application of SPASCER will provide new insights into tissue architecture and a solid foundation for the mechanistic understanding of many biological processes in healthy and diseased tissues.
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Nucleic Acids Research
- Page Range or eLocation-ID:
- p. D1138-D1149
- Oxford University Press
- Sponsoring Org:
- National Science Foundation
More Like this
Single-cell RNA sequencing (scRNA-seq) provides details for individual cells; however, crucial spatial information is often lost. We present SpaOTsc, a method relying on structured optimal transport to recover spatial properties of scRNA-seq data by utilizing spatial measurements of a relatively small number of genes. A spatial metric for individual cells in scRNA-seq data is first established based on a map connecting it with the spatial measurements. The cell–cell communications are then obtained by “optimally transporting” signal senders to target signal receivers in space. Using partial information decomposition, we next compute the intercellular gene–gene information flow to estimate the spatial regulations between genes across cells. Four datasets are employed for cross-validation of spatial gene expression prediction and comparison to known cell–cell communications. SpaOTsc has broader applications, both in integrating non-spatial single-cell measurements with spatial data, and directly in spatial single-cell transcriptomics data to reconstruct spatial cellular dynamics in tissues.
The metabolic heterogeneity and metabolic interplay between cells are known as significant contributors to disease treatment resistance. However, with the lack of a mature high-throughput single-cell metabolomics technology, we are yet to establish systematic understanding of the intra-tissue metabolic heterogeneity and cooperative mechanisms. To mitigate this knowledge gap, we developed a novel computational method, namely, single-cell flux estimation analysis (scFEA), to infer the cell-wise fluxome from single-cell RNA-sequencing (scRNA-seq) data. scFEA is empowered by a systematically reconstructed human metabolic map as a factor graph, a novel probabilistic model to leverage the flux balance constraints on scRNA-seq data, and a novel graph neural network–based optimization solver. The intricate information cascade from transcriptome to metabolome was captured using multilayer neural networks to capitulate the nonlinear dependency between enzymatic gene expressions and reaction rates. We experimentally validated scFEA by generating an scRNA-seq data set with matched metabolomics data on cells of perturbed oxygen and genetic conditions. Application of scFEA on this data set showed the consistency between predicted flux and the observed variation of metabolite abundance in the matched metabolomics data. We also applied scFEA on five publicly available scRNA-seq and spatial transcriptomics data sets and identified context- and cell group–specific metabolic variations.more »
The analysis of spatially resolved transcriptome enables the understanding of the spatial interactions between the cellular environment and transcriptional regulation. In particular, the characterization of the gene–gene co-expression at distinct spatial locations or cell types in the tissue enables delineation of spatial co-regulatory patterns as opposed to standard differential single gene analyses. To enhance the ability and potential of spatial transcriptomics technologies to drive biological discovery, we develop a statistical framework to detect gene co-expression patterns in a spatially structured tissue consisting of different clusters in the form of cell classes or tissue domains.
We develop SpaceX (spatially dependent gene co-expression network), a Bayesian methodology to identify both shared and cluster-specific co-expression network across genes. SpaceX uses an over-dispersed spatial Poisson model coupled with a high-dimensional factor model which is based on a dimension reduction technique for computational efficiency. We show via simulations, accuracy gains in co-expression network estimation and structure by accounting for (increasing) spatial correlation and appropriate noise distributions. In-depth analysis of two spatial transcriptomics datasets in mouse hypothalamus and human breast cancer using SpaceX, detected multiple hub genes which are related to cognitive abilities for the hypothalamus data and multiple cancer genes (e.g. collagen family) frommore »
Availability and implementation
The SpaceX R-package is available at github.com/bayesrx/SpaceX.
Supplementary data are available at Bioinformatics online.
Spatial transcriptomics (ST) technologies are rapidly becoming the extension of single-cell RNA sequencing (scRNAseq), holding the potential of profiling gene expression at a single-cell resolution while maintaining cellular compositions within a tissue. Having both expression profiles and tissue organization enables researchers to better understand cellular interactions and heterogeneity, providing insight into complex biological processes that would not be possible with traditional sequencing technologies. Data generated by ST technologies are inherently noisy, high-dimensional, sparse, and multi-modal (including histological images, count matrices, etc.), thus requiring specialized computational tools for accurate and robust analysis. However, many ST studies currently utilize traditional scRNAseq tools, which are inadequate for analyzing complex ST datasets. On the other hand, many of the existing ST-specific methods are built upon traditional statistical or machine learning frameworks, which have shown to be sub-optimal in many applications due to the scale, multi-modality, and limitations of spatially resolved data (such as spatial resolution, sensitivity, and gene coverage). Given these intricacies, researchers have developed deep learning (DL)-based models to alleviate ST-specific challenges. These methods include new state-of-the-art models in alignment, spatial reconstruction, and spatial clustering, among others. However, DL models for ST analysis are nascent and remain largely underexplored. In this review, wemore »