skip to main content

Title: NetExtractor: Extracting a Cerebellar Tissue Gene Regulatory Network Using Differentially Expressed High Mutual Information Binary RNA Profiles
Bigenic expression relationships are conventionally defined based on metrics such as Pearson or Spearman correlation that cannot typically detect latent, non-linear dependencies or require the relationship to be monotonic. Further, the combination of intrinsic and extrinsic noise as well as embedded relationships between sample sub-populations reduces the probability of extracting biologically relevant edges during the construction of gene co-expression networks (GCNs). In this report, we address these problems via our NetExtractor algorithm. NetExtractor examines all pairwise gene expression profiles first with Gaussian mixture models (GMMs) to identify sample sub-populations followed by mutual information (MI) analysis that is capable of detecting non-linear differential bigenic expression relationships. We applied NetExtractor to brain tissue RNA profiles from the Genotype-Tissue Expression (GTEx) project to obtain a brain tissue specific gene expression relationship network centered on cerebellar and cerebellar hemisphere enriched edges. We leveraged the PsychENCODE pre-frontal cortex (PFC) gene regulatory network (GRN) to construct a cerebellar cortex (cerebellar) GRN associated with transcriptionally active regions in cerebellar tissue. Thus, we demonstrate the utility of our NetExtractor approach to detect biologically relevant and novel non-linear binary gene relationships.
; ; ; ; ;
Award ID(s):
1725573 1659300
Publication Date:
Journal Name:
G3: Genes|Genomes|Genetics
Page Range or eLocation-ID:
2953 to 2963
Sponsoring Org:
National Science Foundation
More Like this
  1. Gene co-expression networks (GCNs) are constructed from Gene Expression Matrices (GEMs) in a bottom up approach where all gene pairs are tested for correlation within the context of the input sample set. This approach is computationally intensive for many current GEMs and may not be scalable to millions of samples. Further, traditional GCNs do not detect non-linear relationships missed by correlation tests and do not place genetic relationships in a gene expression intensity context. In this report, we propose EdgeScaping, which constructs and analyzes the pairwise gene intensity network in a holistic, top down approach where no edges are filtered.more »EdgeScaping uses a novel technique to convert traditional pairwise gene expression data to an image based format. This conversion not only performs feature compression, making our algorithm highly scalable, but it also allows for exploring non-linear relationships between genes by leveraging deep learning image analysis algorithms. Using the learned embedded feature space we implement a fast, efficient algorithm to cluster the entire space of gene expression relationships while retaining gene expression intensity. Since EdgeScaping does not eliminate conventionally noisy edges, it extends the identification of co-expression relationships beyond classically correlated edges to facilitate the discovery of novel or unusual expression patterns within the network. We applied EdgeScaping to a human tumor GEM to identify sets of genes that exhibit conventional and non-conventional interdependent non-linear behavior associated with brain specific tumor sub-types that would be eliminated in conventional bottom-up construction of GCNs. Edgescaping source code is available at under the MIT license.« less
  2. Abstract

    The human brain is a complex organ that consists of several regions each with a unique gene expression pattern. Our intent in this study was to construct a gene co-expression network (GCN) for the normal brain using RNA expression profiles from the Genotype-Tissue Expression (GTEx) project. The brain GCN contains gene correlation relationships that are broadly present in the brain or specific to thirteen brain regions, which we later combined into six overarching brain mini-GCNs based on the brain’s structure. Using the expression profiles of brain region-specific GCN edges, we determined how well the brain region samples could bemore »discriminated from each other, visually with t-SNE plots or quantitatively with the Gene Oracle deep learning classifier. Next, we tested these gene sets on their relevance to human tumors of brain and non-brain origin. Interestingly, we found that genes in the six brain mini-GCNs showed markedly higher mutation rates in tumors relative to matched sets of random genes. Further, we found that cortex genes subdivided Head and Neck Squamous Cell Carcinoma (HNSC) tumors and Pheochromocytoma and Paraganglioma (PCPG) tumors into distinct groups. The brain GCN and mini-GCNs are useful resources for the classification of brain regions and identification of biomarker genes for brain related phenotypes.

    « less
  3. Current standards for safe delivery of electrical stimulation to the central nervous system are based on foundational studies which examined post-mortem tissue for histological signs of damage. This set of observations and the subsequently proposed limits to safe stimulation, termed the “Shannon limits,” allow for a simple calculation (using charge per phase and charge density) to determine the intensity of electrical stimulation that can be delivered safely to brain tissue. In the three decades since the Shannon limits were reported, advances in molecular biology have allowed for more nuanced and detailed approaches to be used to expand current understanding ofmore »the physiological effects of stimulation. Here, we demonstrate the use of spatial transcriptomics (ST) in an exploratory investigation to assess the biological response to electrical stimulation in the brain. Electrical stimulation was delivered to the rat visual cortex with either acute or chronic electrode implantation procedures. To explore the influence of device type and stimulation parameters, we used carbon fiber ultramicroelectrode arrays (7 μm diameter) and microwire electrode arrays (50 μm diameter) delivering charge and charge density levels selected above and below reported tissue damage thresholds (range: 2–20 nC, 0.1–1 mC/cm 2 ). Spatial transcriptomics was performed using Visium Spatial Gene Expression Slides (10x Genomics, Pleasanton, CA, United States), which enabled simultaneous immunohistochemistry and ST to directly compare traditional histological metrics to transcriptional profiles within each tissue sample. Our data give a first look at unique spatial patterns of gene expression that are related to cellular processes including inflammation, cell cycle progression, and neuronal plasticity. At the acute timepoint, an increase in inflammatory and plasticity related genes was observed surrounding a stimulating electrode compared to a craniotomy control. At the chronic timepoint, an increase in inflammatory and cell cycle progression related genes was observed both in the stimulating vs. non-stimulating microwire electrode comparison and in the stimulating microwire vs. carbon fiber comparison. Using the spatial aspect of this method as well as the within-sample link to traditional metrics of tissue damage, we demonstrate how these data may be analyzed and used to generate new hypotheses and inform safety standards for stimulation in cortex.« less
  4. Given the complex relationship between gene expression and phenotypic outcomes, computationally efficient approaches are needed to sift through large high-dimensional datasets in order to identify biologically relevant biomarkers. In this report, we describe a method of identifying the most salient biomarker genes in a dataset, which we call "candidate genes", by evaluating the ability of gene combinations to classify samples from a dataset, which we call "classification potential". Our algorithm, Gene Oracle, uses a neural network to test user defined gene sets for polygenic classification potential and then uses a combinatorial approach to further decompose selected gene sets into candidatemore »and non-candidate biomarker genes. We tested this algorithm on curated gene sets from the Molecular Signatures Database (MSigDB) quantified in RNAseq gene expression matrices obtained from The Cancer Genome Atlas (TCGA) and Genotype-Tissue Expression (GTEx) data repositories. First, we identified which MSigDB Hallmark subsets have significant classification potential for both the TCGA and GTEx datasets. Then, we identified the most discriminatory candidate biomarker genes in each Hallmark gene set and provide evidence that the improved biomarker potential of these genes may be due to reduced functional complexity.« less
  5. Cherry, J M (Ed.)
    Abstract The mechanisms that coordinate cellular gene expression are highly complex and intricately interconnected. Thus, it is necessary to move beyond a fully reductionist approach to understanding genetic information flow and begin focusing on the networked connections between genes that organize cellular function. Continued advancements in computational hardware, coupled with the development of gene correlation network algorithms, provide the capacity to study networked interactions between genes rather than their isolated functions. For example, gene coexpression networks are used to construct gene relationship networks using linear metrics such as Spearman or Pearson correlation. Recently, there have been tools designed to deepenmore »these analyses by differentiating between intrinsic vs extrinsic noise within gene expression values, identifying different modules based on tissue phenotype, and capturing potential nonlinear relationships. In this report, we introduce an algorithm with a novel application of image-based segmentation modalities utilizing blob detection techniques applied for detecting bigenic edges in a gene expression matrix. We applied this algorithm called EdgeCrafting to a bulk RNA-sequencing gene expression matrix comprised of a healthy kidney and cancerous kidney data. We then compared EdgeCrafting against 4 other RNA expression analysis techniques: Weighted Gene Correlation Network Analysis, Knowledge Independent Network Construction, NetExtractor, and Differential gene expression analysis.« less