skip to main content

This content will become publicly available on December 1, 2024

Title: nnSVG for the scalable identification of spatially variable genes using nearest-neighbor Gaussian processes
Abstract Feature selection to identify spatially variable genes or other biologically informative genes is a key step during analyses of spatially-resolved transcriptomics data. Here, we propose nnSVG, a scalable approach to identify spatially variable genes based on nearest-neighbor Gaussian processes. Our method (i) identifies genes that vary in expression continuously across the entire tissue or within a priori defined spatial domains, (ii) uses gene-specific estimates of length scale parameters within the Gaussian process models, and (iii) scales linearly with the number of spatial locations. We demonstrate the performance of our method using experimental data from several technological platforms and simulations. A software implementation is available at .  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Nature Communications
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent technology breakthroughs in spatially resolved transcriptomics (SRT) have enabled the comprehensive molecular characterization of cells whilst preserving their spatial and gene expression contexts. One of the fundamental questions in analyzing SRT data is the identification of spatially variable genes whose expressions display spatially correlated patterns. Existing approaches are built upon either the Gaussian process-based model, which relies onad hockernels, or the energy-based Ising model, which requires gene expression to be measured on a lattice grid. To overcome these potential limitations, we developed a generalized energy-based framework to model gene expression measured from imaging-based SRT platforms, accommodating the irregular spatial distribution of measured cells. Our Bayesian model applies a zero-inflated negative binomial mixture model to dichotomize the raw count data, reducing noise. Additionally, we incorporate a geostatistical mark interaction model with a generalized energy function, where the interaction parameter is used to identify the spatial pattern. Auxiliary variable MCMC algorithms were employed to sample from the posterior distribution with an intractable normalizing constant. We demonstrated the strength of our method on both simulated and real data. Our simulation study showed that our method captured various spatial patterns with high accuracy; moreover, analysis of a seqFISH dataset and a STARmap dataset established that our proposed method is able to identify genes with novel and strong spatial patterns.

    more » « less
  2. A recent technology breakthrough in spatial molecular profiling (SMP) has enabled the comprehensive molecular characterizations of single cells while preserving spatial information. It provides new opportunities to delineate how cells from different origins form tissues with distinctive structures and functions. One immediate question in SMP data analysis is to identify genes whose expressions exhibit spatially correlated patterns, called spatially variable (SV) genes. Most current methods to identify SV genes are built upon the geostatistical model with Gaussian process to capture the spatial patterns. However, the Gaussian process models rely on ad hoc kernels that could limit the models' ability to identify complex spatial patterns. In order to overcome this challenge and capture more types of spatial patterns, we introduce a Bayesian approach to identify SV genes via a modified Ising model. The key idea is to use the energy interaction parameter of the Ising model to characterize spatial expression patterns. We use auxiliary variable Markov chain Monte Carlo algorithms to sample from the posterior distribution with an intractable normalizing constant in the model. Simulation studies using both simulated and synthetic data showed that the energy‐based modeling approach led to higher accuracy in detecting SV genes than those kernel‐based methods. When applied to two real spatial transcriptomics (ST) datasets, the proposed method discovered novel spatial patterns that shed light on the biological mechanisms. In summary, the proposed method presents a new perspective for analyzing ST data.

    more » « less
  3. Recent technological advances have enabled spatially resolved measurements of expression profiles for hundreds to thousands of genes in fixed tissues at single-cell resolution. However, scalable computational analysis methods able to take into consideration the inherent 3D spatial organization of cell types and nonuniform cellular densities within tissues are still lacking. To address this, we developed MERINGUE, a computational framework based on spatial autocorrelation and cross-correlation analysis to identify genes with spatially heterogeneous expression patterns, infer putative cell–cell communication, and perform spatially informed cell clustering in 2D and 3D in a density-agnostic manner using spatially resolved transcriptomic data. We applied MERINGUE to a variety of spatially resolved transcriptomic data sets including multiplexed error-robust fluorescence in situ hybridization (MERFISH), spatial transcriptomics, Slide-seq, and aligned in situ hybridization (ISH) data. We anticipate that such statistical analysis of spatially resolved transcriptomic data will facilitate our understanding of the interplay between cell state and spatial organization in tissue development and disease. 
    more » « less
  4. Spatial population genetic data often exhibits ‘isolation-by-distance,’ where genetic similarity tends to decrease as individuals become more geographically distant. The rate at which genetic similarity decays with distance is often spatially heterogeneous due to variable population processes like genetic drift, gene flow, and natural selection. Petkova et al., 2016 developed a statistical method called Estimating Effective Migration Surfaces (EEMS) for visualizing spatially heterogeneous isolation-by-distance on a geographic map. While EEMS is a powerful tool for depicting spatial population structure, it can suffer from slow runtimes. Here, we develop a related method called Fast Estimation of Effective Migration Surfaces (FEEMS). FEEMS uses a Gaussian Markov Random Field model in a penalized likelihood framework that allows for efficient optimization and output of effective migration surfaces. Further, the efficient optimization facilitates the inference of migration parameters per edge in the graph, rather than per node (as in EEMS). With simulations, we show conditions under which FEEMS can accurately recover effective migration surfaces with complex gene-flow histories, including those with anisotropy. We apply FEEMS to population genetic data from North American gray wolves and show it performs favorably in comparison to EEMS, with solutions obtained orders of magnitude faster. Overall, FEEMS expands the ability of users to quickly visualize and interpret spatial structure in their data. 
    more » « less
  5. Martelli, Pier Luigi (Ed.)
    Abstract Motivation Clustering spatial-resolved gene expression is an essential analysis to reveal gene activities in the underlying morphological context by their functional roles. However, conventional clustering analysis does not consider gene expression co-localizations in tissue for detecting spatial expression patterns or functional relationships among the genes for biological interpretation in the spatial context. In this article, we present a convolutional neural network (CNN) regularized by the graph of protein–protein interaction (PPI) network to cluster spatially resolved gene expression. This method improves the coherence of spatial patterns and provides biological interpretation of the gene clusters in the spatial context by exploiting the spatial localization by convolution and gene functional relationships by graph-Laplacian regularization. Results In this study, we tested clustering the spatially variable genes or all expressed genes in the transcriptome in 22 Visium spatial transcriptomics datasets of different tissue sections publicly available from 10× Genomics and spatialLIBD. The results demonstrate that the PPI-regularized CNN constantly detects gene clusters with coherent spatial patterns and significantly enriched by gene functions with the state-of-the-art performance. Additional case studies on mouse kidney tissue and human breast cancer tissue suggest that the PPI-regularized CNN also detects spatially co-expressed genes to define the corresponding morphological context in the tissue with valuable insights. Availability and implementation Source code is available at Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less