skip to main content

This content will become publicly available on December 2, 2022

Title: Detecting spatially co-expressed gene clusters with functional coherence by graph-regularized convolutional neural network
Abstract Motivation Clustering spatial-resolved gene expression is an essential analysis to reveal gene activities in the underlying morphological context by their functional roles. However, conventional clustering analysis does not consider gene expression co-localizations in tissue for detecting spatial expression patterns or functional relationships among the genes for biological interpretation in the spatial context. In this article, we present a convolutional neural network (CNN) regularized by the graph of protein–protein interaction (PPI) network to cluster spatially resolved gene expression. This method improves the coherence of spatial patterns and provides biological interpretation of the gene clusters in the spatial context by exploiting the spatial localization by convolution and gene functional relationships by graph-Laplacian regularization. Results In this study, we tested clustering the spatially variable genes or all expressed genes in the transcriptome in 22 Visium spatial transcriptomics datasets of different tissue sections publicly available from 10× Genomics and spatialLIBD. The results demonstrate that the PPI-regularized CNN constantly detects gene clusters with coherent spatial patterns and significantly enriched by gene functions with the state-of-the-art performance. Additional case studies on mouse kidney tissue and human breast cancer tissue suggest that the PPI-regularized CNN also detects spatially co-expressed genes to define the corresponding morphological context more » in the tissue with valuable insights. Availability and implementation Source code is available at Supplementary information Supplementary data are available at Bioinformatics online. « less
; ; ; ; ;
Martelli, Pier Luigi
Award ID(s):
Publication Date:
Journal Name:
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Motivation Cancer heterogeneity is observed at multiple biological levels. To improve our understanding of these differences and their relevance in medicine, approaches to link organ- and tissue-level information from diagnostic images and cellular-level information from genomics are needed. However, these ‘radiogenomic’ studies often use linear or shallow models, depend on feature selection, or consider one gene at a time to map images to genes. Moreover, no study has systematically attempted to understand the molecular basis of imaging traits based on the interpretation of what the neural network has learned. These studies are thus limited in their ability to understandmore »the transcriptomic drivers of imaging traits, which could provide additional context for determining clinical outcomes. Results We present a neural network-based approach that takes high-dimensional gene expression data as input and performs non-linear mapping to an imaging trait. To interpret the models, we propose gene masking and gene saliency to extract learned relationships from radiogenomic neural networks. In glioblastoma patients, our models outperformed comparable classifiers (>0.10 AUC) and our interpretation methods were validated using a similar model to identify known relationships between genes and molecular subtypes. We found that tumor imaging traits had specific transcription patterns, e.g. edema and genes related to cellular invasion, and 10 radiogenomic traits were significantly predictive of survival. We demonstrate that neural networks can model transcriptomic heterogeneity to reflect differences in imaging and can be used to derive radiogenomic traits with clinical value. Availability and implementation Contact Supplementary information Supplementary data are available at Bioinformatics online.« less
  2. The mammalian brain consists of an intricate tapestry of cell types, with diversity crucial for function that arises from both differential gene expression and circuit-specific anatomy. Yet, retrieving high-content gene-expression information while retaining 3D positional anatomy at cellular resolution has been difficult, limiting integrative understanding of brain structure and function. Here we introduce and apply a technology for 3D intact-tissue RNA sequencing, termed STARmap (Spatially-resolved Transcript Amplicon Readout Mapping), which integrates highly-specific signal amplification, novel hydrogel-tissue chemistry, and an error-reduction sequencing process. The capabilities of STARmap were tested by mapping from 160 to 1,020 distinct genes simultaneously in sections ofmore »mouse brain at single-cell resolution with unprecedented efficiency, accuracy and reproducibility. These experiments led to the discovery of multiple new neocortical cell types, with gene markers and spatial patterns of organization not previously described, by comparison of the molecularly-defined architectures of sensory versus cognitive neocortex, and by quantification of expression of activity-regulated genes as a function of stimulation condition, spatial position, and cell typology. By adapting STARmap to thick tissue blocks, we observed and confirmed a novel molecularly-defined gradient distribution of excitatory neuron subtypes across cubic millimeter-scale volumes (>30,000 cells), and discovered a short-range 3D pattern of self-clustering shared by many inhibitory neuron subtypes that was accurately identifiable with a 3D STARmap approach.« less
  3. Valencia, Alfonso (Ed.)
    Abstract Motivation Protein function prediction, based on the patterns of connection in a protein–protein interaction (or association) network, is perhaps the most studied of the classical, fundamental inference problems for biological networks. A highly successful set of recent approaches use random walk-based low-dimensional embeddings that tend to place functionally similar proteins into coherent spatial regions. However, these approaches lose valuable local graph structure from the network when considering only the embedding. We introduce GLIDER, a method that replaces a protein–protein interaction or association network with a new graph-based similarity network. GLIDER is based on a variant of our previous GLIDEmore »method, which was designed to predict missing links in protein–protein association networks, capturing implicit local and global (i.e. embedding-based) graph properties. Results GLIDER outperforms competing methods on the task of predicting GO functional labels in cross-validation on a heterogeneous collection of four human protein–protein association networks derived from the 2016 DREAM Disease Module Identification Challenge, and also on three different protein–protein association networks built from the STRING database. We show that this is due to the strong functional enrichment that is present in the local GLIDER neighborhood in multiple different types of protein–protein association networks. Furthermore, we introduce the GLIDER graph neighborhood as a way for biologists to visualize the local neighborhood of a disease gene. As an application, we look at the local GLIDER neighborhoods of a set of known Parkinson’s Disease GWAS genes, rediscover many genes which have known involvement in Parkinson’s disease pathways, plus suggest some new genes to study. Availability and implementation All code is publicly available and can be accessed here: Supplementary information Supplementary data are available at Bioinformatics online.« less
  4. The proper balance of gene expression is essential for cellular health, organismal development, and maintaining homeostasis. In response to complex internal and external signals, the cell needs to modulate gene expression to maintain proteostasis and establish cellular identity within its niche. On a genome level, single-celled prokaryotic microbes display clustering of co-expressed genes that are regulated as a polycistronic RNA. This phenomenon is largely absent from eukaryotic microbes, although there is extensive clustering of co-expressed genes as functional pairs spread throughout the genome in Saccharomyces cerevisiae. While initial analysis demonstrated conservation of clustering in divergent fungal lineages, a comprehensive analysismore »has yet to be performed. Here we report on the prevalence, conservation, and significance of the functional clustering of co-regulated genes within the opportunistic human pathogen, Candida albicans. Our analysis reveals that there is extensive clustering within this organism—although the identity of the gene pairs is unique compared with those found in S. cerevisiae—indicating that this genomic arrangement evolved after these microbes diverged evolutionarily, rather than being the result of an ancestral arrangement. We report a clustered arrangement in gene families that participate in diverse molecular functions and are not the result of a divergent orientation with a shared promoter. This arrangement coordinates the transcription of the clustered genes to their neighboring genes, with the clusters congregating to genomic loci that are conducive to transcriptional regulation at a distance.« less
  5. While recent strides have been made in understanding the biological process by which stony corals calcify, much remains to be revealed, including the ubiquity across taxa of specific biomolecules involved. Several proteins associated with this process have been identified through proteomic profiling of the skeletal organic matrix (SOM) extracted from three scleractinian species. However, the evolutionary history of this putative “biomineralization toolkit,” including the appearance of these proteins’ throughout metazoan evolution, remains to be resolved. Here we used a phylogenetic approach to examine the evolution of the known scleractinians’ SOM proteins across the Metazoa. Our analysis reveals an evolutionary processmore »dominated by the co-option of genes that originated before the cnidarian diversification. Each one of the three species appears to express a unique set of the more ancient genes, representing the independent co-option of SOM proteins, as well as a substantial proportion of proteins that evolved independently. In addition, in some instances, the different species expressed multiple orthologous proteins sharing the same evolutionary history. Furthermore, the non-random clustering of multiple SOM proteins within scleractinian-specific branches suggests the conservation of protein function between distinct species for what we posit is part of the scleractinian “core biomineralization toolkit.” This “core set” contains proteins that are likely fundamental to the scleractinian biomineralization mechanism. From this analysis, we infer that the scleractinians’ ability to calcify was achieved primarily through multiple lineage-specific protein expansions, which resulted in a new functional role that was not present in the parent gene.« less