skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, June 13 until 2:00 AM ET on Friday, June 14 due to maintenance. We apologize for the inconvenience.


Title: SpaceX: gene co-expression network estimation for spatial transcriptomics
Abstract Motivation

The analysis of spatially resolved transcriptome enables the understanding of the spatial interactions between the cellular environment and transcriptional regulation. In particular, the characterization of the gene–gene co-expression at distinct spatial locations or cell types in the tissue enables delineation of spatial co-regulatory patterns as opposed to standard differential single gene analyses. To enhance the ability and potential of spatial transcriptomics technologies to drive biological discovery, we develop a statistical framework to detect gene co-expression patterns in a spatially structured tissue consisting of different clusters in the form of cell classes or tissue domains.

Results

We develop SpaceX (spatially dependent gene co-expression network), a Bayesian methodology to identify both shared and cluster-specific co-expression network across genes. SpaceX uses an over-dispersed spatial Poisson model coupled with a high-dimensional factor model which is based on a dimension reduction technique for computational efficiency. We show via simulations, accuracy gains in co-expression network estimation and structure by accounting for (increasing) spatial correlation and appropriate noise distributions. In-depth analysis of two spatial transcriptomics datasets in mouse hypothalamus and human breast cancer using SpaceX, detected multiple hub genes which are related to cognitive abilities for the hypothalamus data and multiple cancer genes (e.g. collagen family) from the tumor region for the breast cancer data.

Availability and implementation

The SpaceX R-package is available at github.com/bayesrx/SpaceX.

Supplementary information

Supplementary data are available at Bioinformatics online.

 
more » « less
NSF-PAR ID:
10380259
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Bioinformatics
Volume:
38
Issue:
22
ISSN:
1367-4803
Page Range / eLocation ID:
p. 5033-5041
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Martelli, Pier Luigi (Ed.)
    Abstract Motivation Clustering spatial-resolved gene expression is an essential analysis to reveal gene activities in the underlying morphological context by their functional roles. However, conventional clustering analysis does not consider gene expression co-localizations in tissue for detecting spatial expression patterns or functional relationships among the genes for biological interpretation in the spatial context. In this article, we present a convolutional neural network (CNN) regularized by the graph of protein–protein interaction (PPI) network to cluster spatially resolved gene expression. This method improves the coherence of spatial patterns and provides biological interpretation of the gene clusters in the spatial context by exploiting the spatial localization by convolution and gene functional relationships by graph-Laplacian regularization. Results In this study, we tested clustering the spatially variable genes or all expressed genes in the transcriptome in 22 Visium spatial transcriptomics datasets of different tissue sections publicly available from 10× Genomics and spatialLIBD. The results demonstrate that the PPI-regularized CNN constantly detects gene clusters with coherent spatial patterns and significantly enriched by gene functions with the state-of-the-art performance. Additional case studies on mouse kidney tissue and human breast cancer tissue suggest that the PPI-regularized CNN also detects spatially co-expressed genes to define the corresponding morphological context in the tissue with valuable insights. Availability and implementation Source code is available at https://github.com/kuanglab/CNN-PReg. Supplementary information Supplementary data are available at Bioinformatics online. 
    more » « less
  2. Background

    Gene co‐expression and differential co‐expression analysis has been increasingly used to study co‐functional and co‐regulatory biological mechanisms from large scale transcriptomics data sets.

    Methods

    In this study, we develop a nonparametric approach to identify hub genes and modules in a large co‐expression network with low computational and memory cost, namely MRHCA.

    Results

    We have applied the method to simulated transcriptomics data sets and demonstrated MRHCA can accurately identify hub genes and estimate size of co‐expression modules. With applying MRHCA and differential co‐expression analysis toE. coliand TCGA cancer data, we have identified significant condition specific activated genes inE. coliand distinct gene expression regulatory mechanisms between the cancer types with high copy number variation and small somatic mutations.

    Conclusion

    Our analysis has demonstrated MRHCA can (i) deal with large association networks, (ii) rigorously assess statistical significance for hubs and module sizes, (iii) identify co‐expression modules with low associations, (iv) detect small and significant modules, and (v) allow genes to be present in more than one modules, compared with existing methods.

     
    more » « less
  3. Abstract

    Recent technologies such asspatial transcriptomics, enable the measurement of gene expressions at the single-cell level along with the spatial locations of these cells in the tissue. Spatial clustering of the cells provides valuable insights into the understanding of the functional organization of the tissue. However, most such clustering methods involve some dimension reduction that leads to a loss of the inherent dependency structure among genes at any spatial location in the tissue. This destroys valuable insights of gene co-expression patterns apart from possibly impacting spatial clustering performance. In spatial transcriptomics, the matrix-variate gene expression data, along with spatial coordinates of the single cells, provides information on both gene expression dependencies and cell spatial dependencies through its row and column covariances. In this work, we propose a joint Bayesian approach to simultaneously estimate these gene and spatial cell correlations. These estimates provide data summaries for downstream analyses. We illustrate our method with simulations and analysis of several real spatial transcriptomic datasets. Our work elucidates gene co-expression networks as well as clear spatial clustering patterns of the cells. Furthermore, our analysis reveals that downstream spatial-differential analysis may aid in the discovery of unknown cell types from known marker genes.

     
    more » « less
  4. Abstract Motivation

    Tumor tissue samples often contain an unknown fraction of stromal cells. This problem is widely known as tumor purity heterogeneity (TPH) was recently recognized as a severe issue in omics studies. Specifically, if TPH is ignored when inferring co-expression networks, edges are likely to be estimated among genes with mean shift between non-tumor- and tumor cells rather than among gene pairs interacting with each other in tumor cells. To address this issue, we propose Tumor Specific Net (TSNet), a new method which constructs tumor-cell specific gene/protein co-expression networks based on gene/protein expression profiles of tumor tissues. TSNet treats the observed expression profile as a mixture of expressions from different cell types and explicitly models tumor purity percentage in each tumor sample.

    Results

    Using extensive synthetic data experiments, we demonstrate that TSNet outperforms a standard graphical model which does not account for TPH. We then apply TSNet to estimate tumor specific gene co-expression networks based on TCGA ovarian cancer RNAseq data. We identify novel co-expression modules and hub structure specific to tumor cells.

    Availability and implementation

    R codes can be found at https://github.com/petraf01/TSNet.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

     
    more » « less
  5. Abstract Motivation

    The recent development of spatially resolved transcriptomics (SRT) technologies has facilitated research on gene expression in the spatial context. Annotating cell types is one crucial step for downstream analysis. However, many existing algorithms use an unsupervised strategy to assign cell types for SRT data. They first conduct clustering analysis and then aggregate cluster-level expression based on the clustering results. This workflow fails to leverage the marker gene information efficiently. On the other hand, other cell annotation methods designed for single-cell RNA-seq data utilize the cell-type marker genes information but fail to use spatial information in SRT data.

    Results

    We introduce a statistical spatial transcriptomics cell assignment model, SPAN, to annotate clusters of cells or spots into known types in SRT data with prior knowledge of predefined marker genes and spatial information. The SPAN model annotates cells or spots from SRT data using predefined overexpressed marker genes and combines a mixture model with a hidden Markov random field to model the spatial dependency between neighboring spots. We demonstrate the effectiveness of SPAN against spatial and nonspatial clustering algorithms through extensive simulation and real data experiments.

    Availability and implementation

    https://github.com/ChengZ352/SPAN.

     
    more » « less