skip to main content

Title: Discovering and interpreting transcriptomic drivers of imaging traits using neural networks
Abstract Motivation Cancer heterogeneity is observed at multiple biological levels. To improve our understanding of these differences and their relevance in medicine, approaches to link organ- and tissue-level information from diagnostic images and cellular-level information from genomics are needed. However, these ‘radiogenomic’ studies often use linear or shallow models, depend on feature selection, or consider one gene at a time to map images to genes. Moreover, no study has systematically attempted to understand the molecular basis of imaging traits based on the interpretation of what the neural network has learned. These studies are thus limited in their ability to understand the transcriptomic drivers of imaging traits, which could provide additional context for determining clinical outcomes. Results We present a neural network-based approach that takes high-dimensional gene expression data as input and performs non-linear mapping to an imaging trait. To interpret the models, we propose gene masking and gene saliency to extract learned relationships from radiogenomic neural networks. In glioblastoma patients, our models outperformed comparable classifiers (>0.10 AUC) and our interpretation methods were validated using a similar model to identify known relationships between genes and molecular subtypes. We found that tumor imaging traits had specific transcription patterns, e.g. edema and genes related to cellular invasion, and 10 radiogenomic traits were significantly predictive of survival. We demonstrate that neural networks can model transcriptomic heterogeneity to reflect differences in imaging and can be used to derive radiogenomic traits with clinical value. Availability and implementation Contact Supplementary information Supplementary data are available at Bioinformatics online.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Page Range / eLocation ID:
3537 to 3548
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background

    Genotypes are strongly associated with disease phenotypes, particularly in brain disorders. However, the molecular and cellular mechanisms behind this association remain elusive. With emerging multimodal data for these mechanisms, machine learning methods can be applied for phenotype prediction at different scales, but due to the black-box nature of machine learning, integrating these modalities and interpreting biological mechanisms can be challenging. Additionally, the partial availability of these multimodal data presents a challenge in developing these predictive models.


    To address these challenges, we developed DeepGAMI, an interpretable neural network model to improve genotype–phenotype prediction from multimodal data. DeepGAMI leverages functional genomic information, such as eQTLs and gene regulation, to guide neural network connections. Additionally, it includes an auxiliary learning layer for cross-modal imputation allowing the imputation of latent features of missing modalities and thus predicting phenotypes from a single modality. Finally, DeepGAMI uses integrated gradient to prioritize multimodal features for various phenotypes.


    We applied DeepGAMI to several multimodal datasets including genotype and bulk and cell-type gene expression data in brain diseases, and gene expression and electrophysiology data of mouse neuronal cells. Using cross-validation and independent validation, DeepGAMI outperformed existing methods for classifying disease types, and cellular and clinical phenotypes, even using single modalities (e.g., AUC score of 0.79 for Schizophrenia and 0.73 for cognitive impairment in Alzheimer’s disease).


    We demonstrated that DeepGAMI improves phenotype prediction and prioritizes phenotypic features and networks in multiple multimodal datasets in complex brains and brain diseases. Also, it prioritized disease-associated variants, genes, and regulatory networks linked to different phenotypes, providing novel insights into the interpretation of gene regulatory mechanisms. DeepGAMI is open-source and available for general use.

    more » « less
  2. Abstract

    The neurogenomic mechanisms mediating male–male reproductive cooperative behaviours remain unknown. We leveraged extensive transcriptomic and behavioural data on a neotropical bird species (Pipra filicauda) that performs cooperative courtship displays to understand these mechanisms. In this species, the cooperative display is modulated by testosterone, which promotes cooperation in non‐territorial birds, but suppresses cooperation in territory holders. We sought to understand the neurogenomic underpinnings of three related traits: social status, cooperative display behaviour and testosterone phenotype. To do this, we profiled gene expression in 10 brain nuclei spanning the social decision‐making network (SDMN), and two key endocrine tissues that regulate social behaviour. We associated gene expression with each bird's behavioural and endocrine profile derived from 3 years of repeated measures taken from free‐living birds in the Ecuadorian Amazon. We found distinct landscapes of constitutive gene expression were associated with social status, testosterone phenotype and cooperation, reflecting the modular organization and engagement of neuroendocrine tissues. Sex‐steroid and neuropeptide signalling appeared to be important in mediating status‐specific relationships between testosterone and cooperation, suggesting shared regulatory mechanisms with male aggressive and sexual behaviours. We also identified differentially regulated genes involved in cellular activity and synaptic potentiation, suggesting multiple mechanisms underpin these genomic states. Finally, we identified SDMN‐wide gene expression differences between territorial and floater males that could form the basis of ‘status‐specific’ neurophysiological phenotypes, potentially mediated by testosterone and growth hormone. Overall, our findings provide new, systems‐level insights into the mechanisms of cooperative behaviour and suggest that differences in neurogenomic state are the basis for individual differences in social behaviour.

    more » « less
  3. Borenstein, Elhanan (Ed.)
    Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno , achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at (a python package) and (a command line tool). 
    more » « less
  4. Abstract Background Titinopathies are inherited muscular diseases triggered by genetic mutations in the titin gene. Muscular dystrophy with myositis ( mdm ) is one such disease caused by a LINE repeat insertion, leading to exon skipping and an 83-amino acid residue deletion in the N2A-PEVK region of mouse titin. This region has been implicated in a number of titin—titin ligand interactions, hence are important for myocyte signaling and health. Mice with this mdm mutation develop a severe and progressive muscle degeneration. The range of phenotypic differences observed in mdm mice shows that the deletion of this region induces a cascade of transcriptional changes extending to numerous signaling pathways affected by the titin filament. Previous research has focused on correlating phenotypic differences with muscle function in mdm mice. These studies have provided understanding of the downstream physiological effects resulting from the mdm mutation but only provide insights on processes that can be physiologically observed and measured. We used differential gene expression (DGE) to compare the transcriptomes of extensor digitorum longus (EDL), psoas and soleus muscles from wild-type and mdm mice to develop a deeper understand of these tissue-specific responses. Results The overall expression pattern observed shows a well-differentiated transcriptional signature in mdm muscles compared to wild type. Muscle-specific clusters observed within the mdm transcriptome highlight the level of variability of each muscle to the deletion. Differential gene expression and weighted gene co-expression network analysis showed a strong directional response in oxidative respiration-associated mitochondrial genes, which aligns with the poor shivering and non-shivering thermogenesis previously observed. Sln, which is a marker associated with shivering and non-shivering thermogenesis, showed the strongest expression change in fast-fibered muscles. No drastic changes in MYH expression levels were reported, which indicated an absence of major fiber-type switching events. Overall expression shifts in MYH isoforms, MARPs, and extracellular matrix associated genes demonstrated the transcriptional complexity associated with mdm mutation. The expression alterations in mitochondrial respiration and metabolism related genes in the mdm muscle dominated over other transcriptomic changes, and likely account for the late stage cellular responses in the mdm muscles. Conclusions We were able to demonstrate that the complex nature of mdm mutation extends beyond a simple rearrangement in titin gene. EDL, psoas and soleus exemplify unique response modes observed in skeletal muscles with mdm mutation. Our data also raises the possibility that failure to maintain proper energy homeostasis in mdm muscles may contribute to the pathogenesis of the degenerative phenotype in mdm mice. Understanding the full disease-causing molecular cascade is difficult using bulk RNA sequencing techniques due to intricate nature of the disease. The development of the mdm phenotype is temporally and spatially regulated, hence future studies should focus on single fiber level investigations. 
    more » « less
  5. Abstract Motivation

    The prevalence of high-throughput experimental methods has resulted in an abundance of large-scale molecular and functional interaction networks. The connectivity of these networks provides a rich source of information for inferring functional annotations for genes and proteins. An important challenge has been to develop methods for combining these heterogeneous networks to extract useful protein feature representations for function prediction. Most of the existing approaches for network integration use shallow models that encounter difficulty in capturing complex and highly non-linear network structures. Thus, we propose deepNF, a network fusion method based on Multimodal Deep Autoencoders to extract high-level features of proteins from multiple heterogeneous interaction networks.


    We apply this method to combine STRING networks to construct a common low-dimensional representation containing high-level protein features. We use separate layers for different network types in the early stages of the multimodal autoencoder, later connecting all the layers into a single bottleneck layer from which we extract features to predict protein function. We compare the cross-validation and temporal holdout predictive performance of our method with state-of-the-art methods, including the recently proposed method Mashup. Our results show that our method outperforms previous methods for both human and yeast STRING networks. We also show substantial improvement in the performance of our method in predicting gene ontology terms of varying type and specificity.

    Availability and implementation

    deepNF is freely available at:

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    more » « less