skip to main content


Title: Exploration into biomarker potential of region-specific brain gene co-expression networks
Abstract

The human brain is a complex organ that consists of several regions each with a unique gene expression pattern. Our intent in this study was to construct a gene co-expression network (GCN) for the normal brain using RNA expression profiles from the Genotype-Tissue Expression (GTEx) project. The brain GCN contains gene correlation relationships that are broadly present in the brain or specific to thirteen brain regions, which we later combined into six overarching brain mini-GCNs based on the brain’s structure. Using the expression profiles of brain region-specific GCN edges, we determined how well the brain region samples could be discriminated from each other, visually with t-SNE plots or quantitatively with the Gene Oracle deep learning classifier. Next, we tested these gene sets on their relevance to human tumors of brain and non-brain origin. Interestingly, we found that genes in the six brain mini-GCNs showed markedly higher mutation rates in tumors relative to matched sets of random genes. Further, we found that cortex genes subdivided Head and Neck Squamous Cell Carcinoma (HNSC) tumors and Pheochromocytoma and Paraganglioma (PCPG) tumors into distinct groups. The brain GCN and mini-GCNs are useful resources for the classification of brain regions and identification of biomarker genes for brain related phenotypes.

 
more » « less
Award ID(s):
1725573 1659300
NSF-PAR ID:
10198017
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Nature Publishing Group
Date Published:
Journal Name:
Scientific Reports
Volume:
10
Issue:
1
ISSN:
2045-2322
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Renal cell carcinoma (RCC) subtypes are characterized by distinct molecular profiles. Using RNA expression profiles from 1,009 RCC samples, we constructed a condition-annotated gene coexpression network (GCN). The RCC GCN contains binary gene coexpression relationships (edges) specific to conditions including RCC subtype and tumor stage. As an application of this resource, we discovered RCC GCN edges and modules that were associated with genetic lesions in known RCC driver genes, including VHL, a common initiating clear cell RCC (ccRCC) genetic lesion, and PBRM1 and BAP1 which are early genetic lesions in the Braided Cancer River Model (BCRM). Since ccRCC tumors with PBRM1 mutations respond to targeted therapy differently than tumors with BAP1 mutations, we focused on ccRCC-specific edges associated with tumors that exhibit alternate mutation profiles: VHL-PBRM1 or VHL-BAP1. We found specific blends molecular functions associated with these two mutation paths. Despite these mutation-associated edges having unique genes, they were enriched for the same immunological functions suggesting a convergent functional role for alternate gene sets consistent with the BCRM. The condition annotated RCC GCN described herein is a novel data mining resource for the assignment of polygenic biomarkers and their relationships to RCC tumors with specific molecular and mutational profiles.

     
    more » « less
  2. null (Ed.)
    Gene co-expression networks (GCNs) are constructed from Gene Expression Matrices (GEMs) in a bottom up approach where all gene pairs are tested for correlation within the context of the input sample set. This approach is computationally intensive for many current GEMs and may not be scalable to millions of samples. Further, traditional GCNs do not detect non-linear relationships missed by correlation tests and do not place genetic relationships in a gene expression intensity context. In this report, we propose EdgeScaping, which constructs and analyzes the pairwise gene intensity network in a holistic, top down approach where no edges are filtered. EdgeScaping uses a novel technique to convert traditional pairwise gene expression data to an image based format. This conversion not only performs feature compression, making our algorithm highly scalable, but it also allows for exploring non-linear relationships between genes by leveraging deep learning image analysis algorithms. Using the learned embedded feature space we implement a fast, efficient algorithm to cluster the entire space of gene expression relationships while retaining gene expression intensity. Since EdgeScaping does not eliminate conventionally noisy edges, it extends the identification of co-expression relationships beyond classically correlated edges to facilitate the discovery of novel or unusual expression patterns within the network. We applied EdgeScaping to a human tumor GEM to identify sets of genes that exhibit conventional and non-conventional interdependent non-linear behavior associated with brain specific tumor sub-types that would be eliminated in conventional bottom-up construction of GCNs. Edgescaping source code is available at https://github.com/bhusain/EdgeScaping under the MIT license. 
    more » « less
  3. Abstract Background

    Sexually dimorphic mating behaviors differ between sexes and involve gonadal hormones and possibly sexually dimorphic gene expression in the brain. However, the associations among the brain, gonad, and sexual behavior in teleosts are still unclear. Here, we utilized germ cells-freetdrd12knockout (KO) zebrafish, and steroid synthesis enzymecyp17a1-deficient zebrafish to investigate the differences and interplays in the brain–gonad–behavior axis, and the molecular control of brain dimorphism and male mating behaviors.

    Methods

    Tdrd12+/−;cyp17a1+/−double heterozygous parents were crossed to obtaintdrd12−/−;cyp17a1+/+(tdrd12 KO),tdrd12+/+;cyp17a1−/−(cyp17a1 KO), andtdrd12−/−;cyp17a1−/−(double KO) homozygous progenies. Comparative analysis of mating behaviors were evaluated using Viewpoint zebrafish tracking software and sexual traits were thoroughly characterized based on anatomical and histological experiments in these KOs and wild types. The steroid hormone levels (testosterone, 11-ketotestosterone and 17β-estradiol) in the brains, gonads, and serum were measured using ELISA kits. To achieve a higher resolution view of the differences in region-specific expression patterns of the brain, the brains of these KOs, and control male and female fish were dissected into three regions: the forebrain, midbrain, and hindbrain for transcriptomic analysis.

    Results

    Qualitative analysis of mating behaviors demonstrated thattdrd12−/−fish behaved in the same manner as wild-type males to trigger oviposition behavior, whilecyp17a1−/−and double knockout (KO) fish did not exhibit these behaviors. Based on the observation of sex characteristics, mating behaviors and hormone levels in these mutants, we found that the maintenance of secondary sex characteristics and male mating behavior did not depend on the presence of germ cells; rather, they depended mainly on the 11-ketotestosterone and testosterone levels secreted into the brain–gonad regulatory axis. RNA-seq analysis of different brain regions revealed that the brain transcript profile oftdrd12−/−fish was similar to that of wild-type males, especially in the forebrain and midbrain. However, the brain transcript profiles ofcyp17a1−/−and double KO fish were distinct from those of wild-type males and were partially biased towards the expression pattern of the female brain. Our results revealed important candidate genes and signaling pathways, such as synaptic signaling/neurotransmission, MAPK signaling, and steroid hormone pathways, that shape brain dimorphism and modulate male mating behavior in zebrafish.

    Conclusions

    Our results provide comprehensive analyses and new insights regarding the endogenous interactions in the brain–gonad–behavior axis. Moreover, this study revealed the crucial candidate genes and neural signaling pathways of different brain regions that are involved in modulating brain dimorphism and male mating behavior in zebrafish, which would significantly light up the understanding the neuroendocrine and molecular mechanisms modulating brain dimorphism and male mating behavior in zebrafish and other teleost fish.

    Graphical Abstract 
    more » « less
  4. High-throughput technologies such as DNA microarrays and RNA-sequencing are used to measure the expression levels of large numbers of genes simultaneously. To support the extraction of biological knowledge, individual gene expression levels are transformed to Gene Co-expression Networks (GCNs). In a GCN, nodes correspond to genes, and the weight of the connection between two nodes is a measure of similarity in the expression behavior of the two genes. In general, GCN construction and analysis includes three steps; 1) calculating a similarity value for each pair of genes 2) using these similarity values to construct a fully connected weighted network 3) finding clusters of genes in the network, commonly called modules. The specific implementation of these three steps can significantly impact the final output and the downstream biological analysis. GCN construction is a well-studied topic. Existing algorithms rely on relatively simple statistical and mathematical tools to implement these steps. Currently, software package WGCNA appears to be the most widely accepted standard. We hypothesize that the raw features provided by sequencing data can be leveraged to extract modules of higher quality. A novel preprocessing step of the gene expression data set is introduced that in effect calibrates the expression levels of individual genes, before computing pairwise similarities. Further, the similarity is computed as an inner-product of positive vectors. In experiments, this provides a significant improvement over WGCNA, as measured by aggregate p -values of the gene ontology term enrichment of the computed modules. 
    more » « less
  5. Abstract Gene co-expression networks (GCNs) provide multiple benefits to molecular research including hypothesis generation and biomarker discovery. Transcriptome profiles serve as input for GCN construction and are derived from increasingly larger studies with samples across multiple experimental conditions, treatments, time points, genotypes, etc. Such experiments with larger numbers of variables confound discovery of true network edges, exclude edges and inhibit discovery of context (or condition) specific network edges. To demonstrate this problem, a 475-sample dataset is used to show that up to 97% of GCN edges can be misleading because correlations are false or incorrect. False and incorrect correlations can occur when tests are applied without ensuring assumptions are met, and pairwise gene expression may not meet test assumptions if the expression of at least one gene in the pairwise comparison is a function of multiple confounding variables. The ‘one-size-fits-all’ approach to GCN construction is therefore problematic for large, multivariable datasets. Recently, the Knowledge Independent Network Construction toolkit has been used in multiple studies to provide a dynamic approach to GCN construction that ensures statistical tests meet assumptions and confounding variables are addressed. Additionally, it can associate experimental context for each edge of the network resulting in context-specific GCNs (csGCNs). To help researchers recognize such challenges in GCN construction, and the creation of csGCNs, we provide a review of the workflow. 
    more » « less