skip to main content

Title: Characterizing spatial gene expression heterogeneity in spatially resolved single-cell transcriptomic data with nonuniform cellular densities
Recent technological advances have enabled spatially resolved measurements of expression profiles for hundreds to thousands of genes in fixed tissues at single-cell resolution. However, scalable computational analysis methods able to take into consideration the inherent 3D spatial organization of cell types and nonuniform cellular densities within tissues are still lacking. To address this, we developed MERINGUE, a computational framework based on spatial autocorrelation and cross-correlation analysis to identify genes with spatially heterogeneous expression patterns, infer putative cell–cell communication, and perform spatially informed cell clustering in 2D and 3D in a density-agnostic manner using spatially resolved transcriptomic data. We applied MERINGUE to a variety of spatially resolved transcriptomic data sets including multiplexed error-robust fluorescence in situ hybridization (MERFISH), spatial transcriptomics, Slide-seq, and aligned in situ hybridization (ISH) data. We anticipate that such statistical analysis of spatially resolved transcriptomic data will facilitate our understanding of the interplay between cell state and spatial organization in tissue development and disease.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Genome Research
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation

    Gene regulatory networks (GRNs) in a cell provide the tight feedback needed to synchronize cell actions. However, genes in a cell also take input from, and provide signals to other neighboring cells. These cell–cell interactions (CCIs) and the GRNs deeply influence each other. Many computational methods have been developed for GRN inference in cells. More recently, methods were proposed to infer CCIs using single cell gene expression data with or without cell spatial location information. However, in reality, the two processes do not exist in isolation and are subject to spatial constraints. Despite this rationale, no methods currently exist to infer GRNs and CCIs using the same model.


    We propose CLARIFY, a tool that takes GRNs as input, uses them and spatially resolved gene expression data to infer CCIs, while simultaneously outputting refined cell-specific GRNs. CLARIFY uses a novel multi-level graph autoencoder, which mimics cellular networks at a higher level and cell-specific GRNs at a deeper level. We applied CLARIFY to two real spatial transcriptomic datasets, one using seqFISH and the other using MERFISH, and also tested on simulated datasets from scMultiSim. We compared the quality of predicted GRNs and CCIs with state-of-the-art baseline methods that inferred either only GRNs or only CCIs. The results show that CLARIFY consistently outperforms the baseline in terms of commonly used evaluation metrics. Our results point to the importance of co-inference of CCIs and GRNs and to the use of layered graph neural networks as an inference tool for biological networks.

    Availability and implementation

    The source code and data is available at

    more » « less
  2. Spatially resolved scRNA-seq (sp-scRNA-seq) technologies provide the potential to comprehensively profile gene expression patterns in tissue context. However, the development of computational methods lags behind the advances in these technologies, which limits the fulfillment of their potential. In this study, we develop a deep learning approach for clustering sp-scRNA-seq data, named Deep Spatially constrained Single-cell Clustering (DSSC). In this model, we integrate the spatial information of cells into the clustering process in two steps: (1) the spatial information is encoded by using a graphical neural network model, and (2) cell-to-cell constraints are built based on the spatial expression pattern of the marker genes and added in the model to guide the clustering process. Then, a deep embedding clustering is performed on the bottleneck layer of autoencoder by Kullback–Leibler (KL) divergence along with the learning of feature representation. DSSC is the first model that can use information from both spatial coordinates and marker genes to guide cell/spot clustering. Extensive experiments on both simulated and real data sets show that DSSC boosts clustering performance significantly compared with the state-of-the-art methods. It has robust performance across different data sets with various cell type/tissue organization and/or cell type/tissue spatial dependency. We conclude that DSSC is a promising tool for clustering sp-scRNA-seq data. 
    more » « less
  3. Seeds, which provide a major source of calories for humans, are a unique stage of a flowering plant’s lifecycle. During seed germination the embryo reactivates rapidly and goes through major developmental transitions to become a seedling. This requires extensive and complex spatiotemporal coordination of cell and tissue activity. Existing gene expression profiling methods, such as laser capture microdissection followed by RNA-seq and single-cell RNA7 seq, suffer from either low throughput or the loss of spatial information about the cells analysed. Spatial transcriptomics methods couple high throughput analysis of gene expression simultaneously with the ability to record the spatial location of each individual region analysed. We developed a spatial transcriptomics workflow for germinating barley grain to better understand the spatiotemporal control of gene expression within individual seed cell types. More than 14,000 genes were differentially regulated across 0, 1, 3, 6 and 24 hours after imbibition. This approach enabled us to observe that many functional categories displayed specific spatial expression patterns that could be resolved at a sub-tissue level. Individual aquaporin gene family members, important for water and ion transport, had specific spatial expression patterns over time, as well as genes related to cell wall modification, membrane transport and transcription factors. Using spatial autocorrelation algorithms, we were able to identify auxin transport genes that had increasingly focused expression within subdomains of the embryo over germination time, suggestive of a role in establishment of the embryo axis. Together, our data provides an unprecedented spatially resolved cellular map for barley grain germination and specific genes to target for functional genomics to define cellular restricted processes in tissues during germination. The data can be viewed at 
    more » « less
  4. Abstract

    Spatially resolved genomic technologies have allowed us to study the physical organization of cells and tissues, and promise an understanding of local interactions between cells. However, it remains difficult to precisely align spatial observations across slices, samples, scales, individuals and technologies. Here, we propose a probabilistic model that aligns spatially-resolved samples onto a known or unknown common coordinate system (CCS) with respect to phenotypic readouts (for example, gene expression). Our method, Gaussian Process Spatial Alignment (GPSA), consists of a two-layer Gaussian process: the first layer maps observed samples’ spatial locations onto a CCS, and the second layer maps from the CCS to the observed readouts. Our approach enables complex downstream spatially aware analyses that are impossible or inaccurate with unaligned data, including an analysis of variance, creation of a dense three-dimensional (3D) atlas from sparse two-dimensional (2D) slices or association tests across data modalities.

    more » « less
  5. The mammalian brain consists of an intricate tapestry of cell types, with diversity crucial for function that arises from both differential gene expression and circuit-specific anatomy. Yet, retrieving high-content gene-expression information while retaining 3D positional anatomy at cellular resolution has been difficult, limiting integrative understanding of brain structure and function. Here we introduce and apply a technology for 3D intact-tissue RNA sequencing, termed STARmap (Spatially-resolved Transcript Amplicon Readout Mapping), which integrates highly-specific signal amplification, novel hydrogel-tissue chemistry, and an error-reduction sequencing process. The capabilities of STARmap were tested by mapping from 160 to 1,020 distinct genes simultaneously in sections of mouse brain at single-cell resolution with unprecedented efficiency, accuracy and reproducibility. These experiments led to the discovery of multiple new neocortical cell types, with gene markers and spatial patterns of organization not previously described, by comparison of the molecularly-defined architectures of sensory versus cognitive neocortex, and by quantification of expression of activity-regulated genes as a function of stimulation condition, spatial position, and cell typology. By adapting STARmap to thick tissue blocks, we observed and confirmed a novel molecularly-defined gradient distribution of excitatory neuron subtypes across cubic millimeter-scale volumes (>30,000 cells), and discovered a short-range 3D pattern of self-clustering shared by many inhibitory neuron subtypes that was accurately identifiable with a 3D STARmap approach. 
    more » « less