skip to main content

Title: Alignment of spatial genomics data using deep Gaussian processes

Spatially resolved genomic technologies have allowed us to study the physical organization of cells and tissues, and promise an understanding of local interactions between cells. However, it remains difficult to precisely align spatial observations across slices, samples, scales, individuals and technologies. Here, we propose a probabilistic model that aligns spatially-resolved samples onto a known or unknown common coordinate system (CCS) with respect to phenotypic readouts (for example, gene expression). Our method, Gaussian Process Spatial Alignment (GPSA), consists of a two-layer Gaussian process: the first layer maps observed samples’ spatial locations onto a CCS, and the second layer maps from the CCS to the observed readouts. Our approach enables complex downstream spatially aware analyses that are impossible or inaccurate with unaligned data, including an analysis of variance, creation of a dense three-dimensional (3D) atlas from sparse two-dimensional (2D) slices or association tests across data modalities.

more » « less
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Nature Methods
Page Range / eLocation ID:
p. 1379-1387
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Spatial genomic technologies characterize the relationship between the structural organization of cells and their cellular state. Despite the availability of various spatial transcriptomic and proteomic profiling platforms, these experiments remain costly and labor-intensive. Traditionally, tissue slicing for spatial sequencing involves parallel axis-aligned sections, often yielding redundant or correlated information. We proposestructured batch experimental design, a method that improves the cost efficiency of spatial genomics experiments by profiling tissue slices that are maximally informative, while recognizing the destructive nature of the process. Applied to two spatial genomics studies—one to construct a spatially-resolved genomic atlas of a tissue and another to localize a region of interest in a tissue, such as a tumor—our approach collects more informative samples using fewer slices compared to traditional slicing strategies. This methodology offers a foundation for developing robust and cost-efficient design strategies, allowing spatial genomics studies to be deployed by smaller, resource-constrained labs.

    more » « less
  2. Spatially resolved scRNA-seq (sp-scRNA-seq) technologies provide the potential to comprehensively profile gene expression patterns in tissue context. However, the development of computational methods lags behind the advances in these technologies, which limits the fulfillment of their potential. In this study, we develop a deep learning approach for clustering sp-scRNA-seq data, named Deep Spatially constrained Single-cell Clustering (DSSC). In this model, we integrate the spatial information of cells into the clustering process in two steps: (1) the spatial information is encoded by using a graphical neural network model, and (2) cell-to-cell constraints are built based on the spatial expression pattern of the marker genes and added in the model to guide the clustering process. Then, a deep embedding clustering is performed on the bottleneck layer of autoencoder by Kullback–Leibler (KL) divergence along with the learning of feature representation. DSSC is the first model that can use information from both spatial coordinates and marker genes to guide cell/spot clustering. Extensive experiments on both simulated and real data sets show that DSSC boosts clustering performance significantly compared with the state-of-the-art methods. It has robust performance across different data sets with various cell type/tissue organization and/or cell type/tissue spatial dependency. We conclude that DSSC is a promising tool for clustering sp-scRNA-seq data. 
    more » « less
  3. Abstract Feature selection to identify spatially variable genes or other biologically informative genes is a key step during analyses of spatially-resolved transcriptomics data. Here, we propose nnSVG, a scalable approach to identify spatially variable genes based on nearest-neighbor Gaussian processes. Our method (i) identifies genes that vary in expression continuously across the entire tissue or within a priori defined spatial domains, (ii) uses gene-specific estimates of length scale parameters within the Gaussian process models, and (iii) scales linearly with the number of spatial locations. We demonstrate the performance of our method using experimental data from several technological platforms and simulations. A software implementation is available at . 
    more » « less
  4. Abstract Plant cells communicate information for the regulation of development and responses to external stresses. A key form of this communication is transcriptional regulation, accomplished via complex gene networks operating both locally and systemically. To fully understand how genes are regulated across plant tissues and organs, high resolution, multi-dimensional spatial transcriptional data must be acquired and placed within a cellular and organismal context. Spatial transcriptomics (ST) typically provides a two-dimensional spatial analysis of gene expression of tissue sections that can be stacked to render three-dimensional data. For example, X-ray and light-sheet microscopy provide sub-micron scale volumetric imaging of cellular morphology of tissues, organs, or potentially entire organisms. Linking these technologies could substantially advance transcriptomics in plant biology and other fields. Here, we review advances in ST and 3D microscopy approaches and describe how these technologies could be combined to provide high resolution, spatially organized plant tissue transcript mapping. 
    more » « less
  5. Abstract

    We present a Bayesian hierarchical space‐time stochastic weather generator (BayGEN) to generate daily precipitation and minimum and maximum temperatures. BayGEN employs a hierarchical framework with data, process, and parameter layers. In the data layer, precipitation occurrence at each site is modeled using probit regression using a spatially distributed latent Gaussian process; precipitation amounts are modeled as gamma random variables; and minimum and maximum temperatures are modeled as realizations from Gaussian processes. The latent Gaussian process that drives the precipitation occurrence process is modeled in the process layer. In the parameter layer, the model parameters of the data and process layers are modeled as spatially distributed Gaussian processes, consequently enabling the simulation of daily weather at arbitrary (unobserved) locations or on a regular grid. All model parameters are endowed with weakly informative prior distributions. The No‐U Turn sampler, an adaptive form of Hamiltonian Monte Carlo, is used to maximize the model likelihood function and obtain posterior samples of each parameter. Posterior samples of the model parameters propagate uncertainty to the weather simulations, an important feature that makes BayGEN unique compared to traditional weather generators. We demonstrate the utility of BayGEN with application to daily weather generation in a basin of the Argentine Pampas. Furthermore, we evaluate the implications of crop yield by driving a crop simulation model with weather simulations from BayGEN and an equivalent non‐Bayesian weather generator.

    more » « less