skip to main content

Title: A Biologically Interpretable Graph Convolutional Network to Link Genetic Risk Pathways and Imaging Phenotypes of Disease
We propose a novel end-to-end framework for whole-brain and whole-genome imaging-genetics. Our genetics network uses hierarchical graph convolution and pooling operations to embed subject-level data onto a low-dimensional latent space. The hierarchical network implicitly tracks the convergence of genetic risk across well-established biological pathways, while an attention mechanism automatically identifies the salient edges of this network at the subject level. In parallel, our imaging network projects multimodal data onto a set of latent embeddings. For interpretability, we implement a Bayesian feature selection strategy to extract the discriminative imaging biomarkers; these feature weights are optimized alongside the other model parameters. We couple the imaging and genetic embeddings with a predictor network, to ensure that the learned representations are linked to phenotype. We evaluate our framework on a schizophrenia dataset that includes two functional MRI paradigms and gene scores derived from Single Nucleotide Polymorphism data. Using repeated 10-fold cross-validation, we show that our imaging-genetics fusion achieves the better classification performance than state-of-the-art baselines. In an exploratory analysis, we further show that the biomarkers identified by our model are reproducible and closely associated with deficits in schizophrenia.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
International Conference on Learning Representations
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We propose a joint dictionary learning framework that couples imaging and genetics data in a low dimensional subspace as guided by clinical diagnosis. We use a graph regularization penalty to simultaneously capture inter-regional brain interactions and identify the representative set anatomical basis vectors that span the low dimensional space. We further employ group sparsity to find the representative set of genetic basis vectors that span the same latent space. Finally, the latent projection is used to classify patients versus controls. We have evaluated our model on two task fMRI paradigms and single nucleotide polymorphism (SNP) data from schizophrenic patients and matched neurotypical controls. We employ a ten fold cross validation technique to show the predictive power of our model. We compare our model with canonical correlation analysis of imaging and genetics data and random forest classification. Our approach shows better prediction accuracy on both task datasets. Moreover, the implicated brain regions and genetic variants underlie the well documented deficits in schizophrenia. 
    more » « less
  2. Abstract

    Functional magnetic resonance imaging (fMRI) studies have shown altered brain dynamic functional connectivity (DFC) in mental disorders. Here, we aim to explore DFC across a spectrum of symptomatically‐related disorders including bipolar disorder with psychosis (BPP), schizoaffective disorder (SAD), and schizophrenia (SZ). We introduce a group information guided independent component analysis procedure to estimate both group‐level and subject‐specific connectivity states from DFC. Using resting‐state fMRI data of 238 healthy controls (HCs), 140 BPP, 132 SAD, and 113 SZ patients, we identified measures differentiating groups from the whole‐brain DFC and traditional static functional connectivity (SFC), separately. Results show that DFC provided more informative measures than SFC. Diagnosis‐related connectivity states were evident using DFC analysis. For the dominant state consistent across groups, we found 22 instances of hypoconnectivity (with decreasing trends from HC to BPP to SAD to SZ) mainly involving post‐central, frontal, and cerebellar cortices as well as 34 examples of hyperconnectivity (with increasing trends HC through SZ) primarily involving thalamus and temporal cortices. Hypoconnectivities/hyperconnectivities also showed negative/positive correlations, respectively, with clinical symptom scores. Specifically, hypoconnectivities linking postcentral and frontal gyri were significantly negatively correlated with the PANSS positive/negative scores. For frontal connectivities, BPP resembled HC while SAD and SZ were more similar. Three connectivities involving the left cerebellar crus differentiated SZ from other groups and one connection linking frontal and fusiform cortices showed a SAD‐unique change. In summary, our method is promising for assessing DFC and may yield imaging biomarkers for quantifying the dimension of psychosis.Hum Brain Mapp 38:2683–2708, 2017. ©2017 Wiley Periodicals, Inc.

    more » « less
  3. Abstract Background

    In Alzheimer’s Diseases (AD) research, multimodal imaging analysis can unveil complementary information from multiple imaging modalities and further our understanding of the disease. One application is to discover disease subtypes using unsupervised clustering. However, existing clustering methods are often applied to input features directly, and could suffer from the curse of dimensionality with high-dimensional multimodal data. The purpose of our study is to identify multimodal imaging-driven subtypes in Mild Cognitive Impairment (MCI) participants using a multiview learning framework based on Deep Generalized Canonical Correlation Analysis (DGCCA), to learn shared latent representation with low dimensions from 3 neuroimaging modalities.


    DGCCA applies non-linear transformation to input views using neural networks and is able to learn correlated embeddings with low dimensions that capture more variance than its linear counterpart, generalized CCA (GCCA). We designed experiments to compare DGCCA embeddings with single modality features and GCCA embeddings by generating 2 subtypes from each feature set using unsupervised clustering. In our validation studies, we found that amyloid PET imaging has the most discriminative features compared with structural MRI and FDG PET which DGCCA learns from but not GCCA. DGCCA subtypes show differential measures in 5 cognitive assessments, 6 brain volume measures, and conversion to AD patterns. In addition, DGCCA MCI subtypes confirmed AD genetic markers with strong signals that existing late MCI group did not identify.


    Overall, DGCCA is able to learn effective low dimensional embeddings from multimodal data by learning non-linear projections. MCI subtypes generated from DGCCA embeddings are different from existing early and late MCI groups and show most similarity with those identified by amyloid PET features. In our validation studies, DGCCA subtypes show distinct patterns in cognitive measures, brain volumes, and are able to identify AD genetic markers. These findings indicate the promise of the imaging-driven subtypes and their power in revealing disease structures beyond early and late stage MCI.

    more » « less
  4. Abstract

    There is growing evidence that rather than using a single brain imaging modality to study its association with physiological or symptomatic features, the field is paying more attention to fusion of multimodal information. However, most current multimodal fusion approaches that incorporate functional magnetic resonance imaging (fMRI) are restricted to second‐level 3D features, rather than the original 4D fMRI data. This trade‐off is that the valuable temporal information is not utilized during the fusion step. Here we are motivated to propose a novel approach called “parallel group ICA+ICA” that incorporates temporal fMRI information from group independent component analysis (GICA) into a parallel independent component analysis (ICA) framework, aiming to enable direct fusion of first‐level fMRI features with other modalities (e.g., structural MRI), which thus can detect linked functional network variability and structural covariations. Simulation results show that the proposed method yields accurate intermodality linkage detection regardless of whether it is strong or weak. When applied to real data, we identified one pair of significantly associated fMRI‐sMRI components that show group difference between schizophrenia and controls in both modalities, and this linkage can be replicated in an independent cohort. Finally, multiple cognitive domain scores can be predicted by the features identified in the linked component pair by our proposed method. We also show these multimodal brain features can predict multiple cognitive scores in an independent cohort. Overall, results demonstrate the ability of parallel GICA+ICA to estimate joint information from 4D and 3D data without discarding much of the available information up front, and the potential for using this approach to identify imaging biomarkers to study brain disorders.

    more » « less
  5. Introduction

    Brain imaging genetics aims to explore the genetic architecture underlying brain structure and functions. Recent studies showed that the incorporation of prior knowledge, such as subject diagnosis information and brain regional correlation, can help identify significantly stronger imaging genetic associations. However, sometimes such information may be incomplete or even unavailable.


    In this study, we explore a new data-driven prior knowledge that captures the subject-level similarity by fusing multi-modal similarity networks. It was incorporated into the sparse canonical correlation analysis (SCCA) model, which is aimed to identify a small set of brain imaging and genetic markers that explain the similarity matrix supported by both modalities. It was applied to amyloid and tau imaging data of the ADNI cohort, respectively.


    Fused similarity matrix across imaging and genetic data was found to improve the association performance better or similarly well as diagnosis information, and therefore would be a potential substitute prior when the diagnosis information is not available (i.e., studies focused on healthy controls).


    Our result confirmed the value of all types of prior knowledge in improving association identification. In addition, the fused network representing the subject relationship supported by multi-modal data showed consistently the best or equally best performance compared to the diagnosis network and the co-expression network.

    more » « less