skip to main content

Title: SpaceX: gene co-expression network estimation for spatial transcriptomics
Abstract Motivation

The analysis of spatially resolved transcriptome enables the understanding of the spatial interactions between the cellular environment and transcriptional regulation. In particular, the characterization of the gene–gene co-expression at distinct spatial locations or cell types in the tissue enables delineation of spatial co-regulatory patterns as opposed to standard differential single gene analyses. To enhance the ability and potential of spatial transcriptomics technologies to drive biological discovery, we develop a statistical framework to detect gene co-expression patterns in a spatially structured tissue consisting of different clusters in the form of cell classes or tissue domains.

Results

We develop SpaceX (spatially dependent gene co-expression network), a Bayesian methodology to identify both shared and cluster-specific co-expression network across genes. SpaceX uses an over-dispersed spatial Poisson model coupled with a high-dimensional factor model which is based on a dimension reduction technique for computational efficiency. We show via simulations, accuracy gains in co-expression network estimation and structure by accounting for (increasing) spatial correlation and appropriate noise distributions. In-depth analysis of two spatial transcriptomics datasets in mouse hypothalamus and human breast cancer using SpaceX, detected multiple hub genes which are related to cognitive abilities for the hypothalamus data and multiple cancer genes (e.g. collagen family) from more » the tumor region for the breast cancer data.

Availability and implementation

The SpaceX R-package is available at github.com/bayesrx/SpaceX.

Supplementary information

Supplementary data are available at Bioinformatics online.

« less
Authors:
; ; ;
Publication Date:
NSF-PAR ID:
10380259
Journal Name:
Bioinformatics
Volume:
38
Issue:
22
Page Range or eLocation-ID:
p. 5033-5041
ISSN:
1367-4803
Publisher:
Oxford University Press
Sponsoring Org:
National Science Foundation
More Like this
  1. Martelli, Pier Luigi (Ed.)
    Abstract Motivation Clustering spatial-resolved gene expression is an essential analysis to reveal gene activities in the underlying morphological context by their functional roles. However, conventional clustering analysis does not consider gene expression co-localizations in tissue for detecting spatial expression patterns or functional relationships among the genes for biological interpretation in the spatial context. In this article, we present a convolutional neural network (CNN) regularized by the graph of protein–protein interaction (PPI) network to cluster spatially resolved gene expression. This method improves the coherence of spatial patterns and provides biological interpretation of the gene clusters in the spatial context by exploiting the spatial localization by convolution and gene functional relationships by graph-Laplacian regularization. Results In this study, we tested clustering the spatially variable genes or all expressed genes in the transcriptome in 22 Visium spatial transcriptomics datasets of different tissue sections publicly available from 10× Genomics and spatialLIBD. The results demonstrate that the PPI-regularized CNN constantly detects gene clusters with coherent spatial patterns and significantly enriched by gene functions with the state-of-the-art performance. Additional case studies on mouse kidney tissue and human breast cancer tissue suggest that the PPI-regularized CNN also detects spatially co-expressed genes to define the corresponding morphological contextmore »in the tissue with valuable insights. Availability and implementation Source code is available at https://github.com/kuanglab/CNN-PReg. Supplementary information Supplementary data are available at Bioinformatics online.« less
  2. Abstract Motivation

    Tumor tissue samples often contain an unknown fraction of stromal cells. This problem is widely known as tumor purity heterogeneity (TPH) was recently recognized as a severe issue in omics studies. Specifically, if TPH is ignored when inferring co-expression networks, edges are likely to be estimated among genes with mean shift between non-tumor- and tumor cells rather than among gene pairs interacting with each other in tumor cells. To address this issue, we propose Tumor Specific Net (TSNet), a new method which constructs tumor-cell specific gene/protein co-expression networks based on gene/protein expression profiles of tumor tissues. TSNet treats the observed expression profile as a mixture of expressions from different cell types and explicitly models tumor purity percentage in each tumor sample.

    Results

    Using extensive synthetic data experiments, we demonstrate that TSNet outperforms a standard graphical model which does not account for TPH. We then apply TSNet to estimate tumor specific gene co-expression networks based on TCGA ovarian cancer RNAseq data. We identify novel co-expression modules and hub structure specific to tumor cells.

    Availability and implementation

    R codes can be found at https://github.com/petraf01/TSNet.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

  3. Abstract Motivation

    Gene regulatory networks (GRNs) of the same organism can be different under different conditions, although the overall network structure may be similar. Understanding the difference in GRNs under different conditions is important to understand condition-specific gene regulation. When gene expression and other relevant data under two different conditions are available, they can be used by an existing network inference algorithm to estimate two GRNs separately, and then to identify the difference between the two GRNs. However, such an approach does not exploit the similarity in two GRNs, and may sacrifice inference accuracy.

    Results

    In this paper, we model GRNs with the structural equation model (SEM) that can integrate gene expression and genetic perturbation data, and develop an algorithm named fused sparse SEM (FSSEM), to jointly infer GRNs under two conditions, and then to identify difference of the two GRNs. Computer simulations demonstrate that the FSSEM algorithm outperforms the approaches that estimate two GRNs separately. Analysis of a dataset of lung cancer and another dataset of gastric cancer with FSSEM inferred differential GRNs in cancer versus normal tissues, whose genes with largest network degrees have been reported to be implicated in tumorigenesis. The FSSEM algorithm provides a valuable tool for jointmore »inference of two GRNs and identification of the differential GRN under two conditions.

    Availability and implementation

    The R package fssemR implementing the FSSEM algorithm is available at https://github.com/Ivis4ml/fssemR.git. It is also available on CRAN.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    « less
  4. Abstract Motivation

    High-throughput sequencing technologies, in particular RNA sequencing (RNA-seq), have become the basic practice for genomic studies in biomedical research. In addition to studying genes individually, for example, through differential expression analysis, investigating co-ordinated expression variations of genes may help reveal the underlying cellular mechanisms to derive better understanding and more effective prognosis and intervention strategies. Although there exists a variety of co-expression network based methods to analyze microarray data for this purpose, instead of blindly extending these methods for microarray data that may introduce unnecessary bias, it is crucial to develop methods well adapted to RNA-seq data to identify the functional modules of genes with similar expression patterns.

    Results

    We have developed a fully Bayesian covariate-dependent negative binomial factor analysis (dNBFA) method—dNBFA—for RNA-seq count data, to capture coordinated gene expression changes, while considering effects from covariates reflecting different influencing factors. Unlike existing co-expression network based methods, our proposed model does not require multiple ad-hoc choices on data processing, transformation, as well as co-expression measures and can be directly applied to RNA-seq data. Furthermore, being capable of incorporating covariate information, the proposed method can tackle setups with complex confounding factors in different experiment designs. Finally, the natural model parameterization removes themore »need for a normalization preprocessing step, as commonly adopted to compensate for the effect of sequencing-depth variations. Efficient Bayesian inference of model parameters is derived by exploiting conditional conjugacy via novel data augmentation techniques. Experimental results on several real-world RNA-seq datasets on complex diseases suggest dNBFA as a powerful tool for discovering the gene modules with significant differential expression and meaningful biological insight.

    Availability and implementation

    dNBFA is implemented in R language and is available at https://github.com/siamakz/dNBFA.

    « less
  5. Seeds, which provide a major source of calories for humans, are a unique stage of a flowering plant’s lifecycle. During seed germination the embryo reactivates rapidly and goes through major developmental transitions to become a seedling. This requires extensive and complex spatiotemporal coordination of cell and tissue activity. Existing gene expression profiling methods, such as laser capture microdissection followed by RNA-seq and single-cell RNA7 seq, suffer from either low throughput or the loss of spatial information about the cells analysed. Spatial transcriptomics methods couple high throughput analysis of gene expression simultaneously with the ability to record the spatial location of each individual region analysed. We developed a spatial transcriptomics workflow for germinating barley grain to better understand the spatiotemporal control of gene expression within individual seed cell types. More than 14,000 genes were differentially regulated across 0, 1, 3, 6 and 24 hours after imbibition. This approach enabled us to observe that many functional categories displayed specific spatial expression patterns that could be resolved at a sub-tissue level. Individual aquaporin gene family members, important for water and ion transport, had specific spatial expression patterns over time, as well as genes related to cell wall modification, membrane transport and transcription factors.more »Using spatial autocorrelation algorithms, we were able to identify auxin transport genes that had increasingly focused expression within subdomains of the embryo over germination time, suggestive of a role in establishment of the embryo axis. Together, our data provides an unprecedented spatially resolved cellular map for barley grain germination and specific genes to target for functional genomics to define cellular restricted processes in tissues during germination. The data can be viewed at https://spatial.latrobe.edu.au/.« less