skip to main content


Title: NetExtractor: Extracting a Cerebellar Tissue Gene Regulatory Network Using Differentially Expressed High Mutual Information Binary RNA Profiles
Bigenic expression relationships are conventionally defined based on metrics such as Pearson or Spearman correlation that cannot typically detect latent, non-linear dependencies or require the relationship to be monotonic. Further, the combination of intrinsic and extrinsic noise as well as embedded relationships between sample sub-populations reduces the probability of extracting biologically relevant edges during the construction of gene co-expression networks (GCNs). In this report, we address these problems via our NetExtractor algorithm. NetExtractor examines all pairwise gene expression profiles first with Gaussian mixture models (GMMs) to identify sample sub-populations followed by mutual information (MI) analysis that is capable of detecting non-linear differential bigenic expression relationships. We applied NetExtractor to brain tissue RNA profiles from the Genotype-Tissue Expression (GTEx) project to obtain a brain tissue specific gene expression relationship network centered on cerebellar and cerebellar hemisphere enriched edges. We leveraged the PsychENCODE pre-frontal cortex (PFC) gene regulatory network (GRN) to construct a cerebellar cortex (cerebellar) GRN associated with transcriptionally active regions in cerebellar tissue. Thus, we demonstrate the utility of our NetExtractor approach to detect biologically relevant and novel non-linear binary gene relationships.  more » « less
Award ID(s):
1725573 1659300
NSF-PAR ID:
10201178
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
G3: Genes|Genomes|Genetics
Volume:
10
Issue:
9
ISSN:
2160-1836
Page Range / eLocation ID:
2953 to 2963
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Gene co-expression networks (GCNs) are constructed from Gene Expression Matrices (GEMs) in a bottom up approach where all gene pairs are tested for correlation within the context of the input sample set. This approach is computationally intensive for many current GEMs and may not be scalable to millions of samples. Further, traditional GCNs do not detect non-linear relationships missed by correlation tests and do not place genetic relationships in a gene expression intensity context. In this report, we propose EdgeScaping, which constructs and analyzes the pairwise gene intensity network in a holistic, top down approach where no edges are filtered. EdgeScaping uses a novel technique to convert traditional pairwise gene expression data to an image based format. This conversion not only performs feature compression, making our algorithm highly scalable, but it also allows for exploring non-linear relationships between genes by leveraging deep learning image analysis algorithms. Using the learned embedded feature space we implement a fast, efficient algorithm to cluster the entire space of gene expression relationships while retaining gene expression intensity. Since EdgeScaping does not eliminate conventionally noisy edges, it extends the identification of co-expression relationships beyond classically correlated edges to facilitate the discovery of novel or unusual expression patterns within the network. We applied EdgeScaping to a human tumor GEM to identify sets of genes that exhibit conventional and non-conventional interdependent non-linear behavior associated with brain specific tumor sub-types that would be eliminated in conventional bottom-up construction of GCNs. Edgescaping source code is available at https://github.com/bhusain/EdgeScaping under the MIT license. 
    more » « less
  2. Abstract

    The human brain is a complex organ that consists of several regions each with a unique gene expression pattern. Our intent in this study was to construct a gene co-expression network (GCN) for the normal brain using RNA expression profiles from the Genotype-Tissue Expression (GTEx) project. The brain GCN contains gene correlation relationships that are broadly present in the brain or specific to thirteen brain regions, which we later combined into six overarching brain mini-GCNs based on the brain’s structure. Using the expression profiles of brain region-specific GCN edges, we determined how well the brain region samples could be discriminated from each other, visually with t-SNE plots or quantitatively with the Gene Oracle deep learning classifier. Next, we tested these gene sets on their relevance to human tumors of brain and non-brain origin. Interestingly, we found that genes in the six brain mini-GCNs showed markedly higher mutation rates in tumors relative to matched sets of random genes. Further, we found that cortex genes subdivided Head and Neck Squamous Cell Carcinoma (HNSC) tumors and Pheochromocytoma and Paraganglioma (PCPG) tumors into distinct groups. The brain GCN and mini-GCNs are useful resources for the classification of brain regions and identification of biomarker genes for brain related phenotypes.

     
    more » « less
  3. Abstract

    Electroconvulsive therapy (ECT) is the most effective treatment for severe depression and works by applying an electric current through the brain. The applied current generates an electric field (E-field) and seizure activity, changing the brain’s functional organization. The E-field, which is determined by electrode placement (right unilateral or bitemporal) and pulse amplitude (600, 700, or 800 milliamperes), is associated with the ECT response. However, the neural mechanisms underlying the relationship between E-field, functional brain changes, and clinical outcomes of ECT are not well understood. Here, we investigated the relationships between whole-brain E-field (Ebrain, the 90thpercentile of E-field magnitude in the brain), cerebro-cerebellar functional network connectivity (FNC), and clinical outcomes (cognitive performance and depression severity). A fully automated independent component analysis framework determined the FNC between the cerebro-cerebellar networks. We found a linear relationship between Ebrainand cognitive outcomes. The mediation analysis showed that the cerebellum to middle occipital gyrus (MOG)/posterior cingulate cortex (PCC) FNC mediated the effects of Ebrainon cognitive performance. In addition, there is a mediation effect through the cerebellum to parietal lobule FNC between Ebrainand antidepressant outcomes. The pair-wise t-tests further demonstrated that a larger Ebrainwas associated with increased FNC between cerebellum and MOG and decreased FNC between cerebellum and PCC, which were linked with decreased cognitive performance. This study implies that an optimal E-field balancing the antidepressant and cognitive outcomes should be considered in relation to cerebro-cerebellar functional neuroplasticity.

     
    more » « less
  4. INTRODUCTION Diverse phenotypes, including large brains relative to body size, group living, and vocal learning ability, have evolved multiple times throughout mammalian history. These shared phenotypes may have arisen repeatedly by means of common mechanisms discernible through genome comparisons. RATIONALE Protein-coding sequence differences have failed to fully explain the evolution of multiple mammalian phenotypes. This suggests that these phenotypes have evolved at least in part through changes in gene expression, meaning that their differences across species may be caused by differences in genome sequence at enhancer regions that control gene expression in specific tissues and cell types. Yet the enhancers involved in phenotype evolution are largely unknown. Sequence conservation–based approaches for identifying such enhancers are limited because enhancer activity can be conserved even when the individual nucleotides within the sequence are poorly conserved. This is due to an overwhelming number of cases where nucleotides turn over at a high rate, but a similar combination of transcription factor binding sites and other sequence features can be maintained across millions of years of evolution, allowing the function of the enhancer to be conserved in a particular cell type or tissue. Experimentally measuring the function of orthologous enhancers across dozens of species is currently infeasible, but new machine learning methods make it possible to make reliable sequence-based predictions of enhancer function across species in specific tissues and cell types. RESULTS To overcome the limits of studying individual nucleotides, we developed the Tissue-Aware Conservation Inference Toolkit (TACIT). Rather than measuring the extent to which individual nucleotides are conserved across a region, TACIT uses machine learning to test whether the function of a given part of the genome is likely to be conserved. More specifically, convolutional neural networks learn the tissue- or cell type–specific regulatory code connecting genome sequence to enhancer activity using candidate enhancers identified from only a few species. This approach allows us to accurately associate differences between species in tissue or cell type–specific enhancer activity with genome sequence differences at enhancer orthologs. We then connect these predictions of enhancer function to phenotypes across hundreds of mammals in a way that accounts for species’ phylogenetic relatedness. We applied TACIT to identify candidate enhancers from motor cortex and parvalbumin neuron open chromatin data that are associated with brain size relative to body size, solitary living, and vocal learning across 222 mammals. Our results include the identification of multiple candidate enhancers associated with brain size relative to body size, several of which are located in linear or three-dimensional proximity to genes whose protein-coding mutations have been implicated in microcephaly or macrocephaly in humans. We also identified candidate enhancers associated with the evolution of solitary living near a gene implicated in separation anxiety and other enhancers associated with the evolution of vocal learning ability. We obtained distinct results for bulk motor cortex and parvalbumin neurons, demonstrating the value in applying TACIT to both bulk tissue and specific minority cell type populations. To facilitate future analyses of our results and applications of TACIT, we released predicted enhancer activity of >400,000 candidate enhancers in each of 222 mammals and their associations with the phenotypes we investigated. CONCLUSION TACIT leverages predicted enhancer activity conservation rather than nucleotide-level conservation to connect genetic sequence differences between species to phenotypes across large numbers of mammals. TACIT can be applied to any phenotype with enhancer activity data available from at least a few species in a relevant tissue or cell type and a whole-genome alignment available across dozens of species with substantial phenotypic variation. Although we developed TACIT for transcriptional enhancers, it could also be applied to genomic regions involved in other components of gene regulation, such as promoters and splicing enhancers and silencers. As the number of sequenced genomes grows, machine learning approaches such as TACIT have the potential to help make sense of how conservation of, or changes in, subtle genome patterns can help explain phenotype evolution. Tissue-Aware Conservation Inference Toolkit (TACIT) associates genetic differences between species with phenotypes. TACIT works by generating open chromatin data from a few species in a tissue related to a phenotype, using the sequences underlying open and closed chromatin regions to train a machine learning model for predicting tissue-specific open chromatin and associating open chromatin predictions across dozens of mammals with the phenotype. [Species silhouettes are from PhyloPic] 
    more » « less
  5. Current standards for safe delivery of electrical stimulation to the central nervous system are based on foundational studies which examined post-mortem tissue for histological signs of damage. This set of observations and the subsequently proposed limits to safe stimulation, termed the “Shannon limits,” allow for a simple calculation (using charge per phase and charge density) to determine the intensity of electrical stimulation that can be delivered safely to brain tissue. In the three decades since the Shannon limits were reported, advances in molecular biology have allowed for more nuanced and detailed approaches to be used to expand current understanding of the physiological effects of stimulation. Here, we demonstrate the use of spatial transcriptomics (ST) in an exploratory investigation to assess the biological response to electrical stimulation in the brain. Electrical stimulation was delivered to the rat visual cortex with either acute or chronic electrode implantation procedures. To explore the influence of device type and stimulation parameters, we used carbon fiber ultramicroelectrode arrays (7 μm diameter) and microwire electrode arrays (50 μm diameter) delivering charge and charge density levels selected above and below reported tissue damage thresholds (range: 2–20 nC, 0.1–1 mC/cm 2 ). Spatial transcriptomics was performed using Visium Spatial Gene Expression Slides (10x Genomics, Pleasanton, CA, United States), which enabled simultaneous immunohistochemistry and ST to directly compare traditional histological metrics to transcriptional profiles within each tissue sample. Our data give a first look at unique spatial patterns of gene expression that are related to cellular processes including inflammation, cell cycle progression, and neuronal plasticity. At the acute timepoint, an increase in inflammatory and plasticity related genes was observed surrounding a stimulating electrode compared to a craniotomy control. At the chronic timepoint, an increase in inflammatory and cell cycle progression related genes was observed both in the stimulating vs. non-stimulating microwire electrode comparison and in the stimulating microwire vs. carbon fiber comparison. Using the spatial aspect of this method as well as the within-sample link to traditional metrics of tissue damage, we demonstrate how these data may be analyzed and used to generate new hypotheses and inform safety standards for stimulation in cortex. 
    more » « less