skip to main content

Title: A review of statistical methods in imaging genetics

With the rapid growth of modern technology, many biomedical studies are being conducted to collect massive datasets with volumes of multi‐modality imaging, genetic, neurocognitive and clinical information from increasingly large cohorts. Simultaneously extracting and integrating rich and diverse heterogeneous information in neuroimaging and/or genomics from these big datasets could transform our understanding of how genetic variants impact brain structure and function, cognitive function and brain‐related disease risk across the lifespan. Such understanding is critical for diagnosis, prevention and treatment of numerous complex brain‐related disorders (e.g., schizophrenia and Alzheimer's disease). However, the development of analytical methods for the joint analysis of both high‐dimensional imaging phenotypes and high‐dimensional genetic data, a big data squared (BD2) problem, presents major computational and theoretical challenges for existing analytical methods. Besides the high‐dimensional nature of BD2, various neuroimaging measures often exhibit strong spatial smoothness and dependence and genetic markers may have a natural dependence structure arising from linkage disequilibrium. We review some recent developments of various statistical techniques for imaging genetics, including massive univariate and voxel‐wise approaches, reduced rank regression, mixture models and group sparse multi‐task regression. By doing so, we hope that this review may encourage others in the statistical community to enter into this new and exciting field of research.The Canadian Journal of Statistics47: 108–131; 2019 © 2019 Statistical Society of Canada

more » « less
Author(s) / Creator(s):
 ;  ;  ;
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Canadian Journal of Statistics
Page Range / eLocation ID:
p. 108-131
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Background Alzheimer’s disease (AD) is a complex neurodegenerative disorder and the most common type of dementia. AD is characterized by a decline of cognitive function and brain atrophy, and is highly heritable with estimated heritability ranging from 60 to 80 $$\%$$ % . The most straightforward and widely used strategy to identify AD genetic basis is to perform genome-wide association study (GWAS) of the case-control diagnostic status. These GWAS studies have identified over 50 AD related susceptibility loci. Recently, imaging genetics has emerged as a new field where brain imaging measures are studied as quantitative traits to detect genetic factors. Given that many imaging genetics studies did not involve the diagnostic outcome in the analysis, the identified imaging or genetic markers may not be related or specific to the disease outcome. Results We propose a novel method to identify disease-related genetic variants enriched by imaging endophenotypes, which are the imaging traits associated with both genetic factors and disease status. Our analysis consists of three steps: (1) map the effects of a genetic variant (e.g., single nucleotide polymorphism or SNP) onto imaging traits across the brain using a linear regression model, (2) map the effects of a diagnosis phenotype onto imaging traits across the brain using a linear regression model, and (3) detect SNP-diagnosis association via correlating the SNP effects with the diagnostic effects on the brain-wide imaging traits. We demonstrate the promise of our approach by applying it to the Alzheimer’s Disease Neuroimaging Initiative database. Among 54 AD related susceptibility loci reported in prior large-scale AD GWAS, our approach identifies 41 of those from a much smaller study cohort while the standard association approaches identify only two of those. Clearly, the proposed imaging endophenotype enriched approach can reveal promising AD genetic variants undetectable using the traditional method. Conclusion We have proposed a novel method to identify AD genetic variants enriched by brain-wide imaging endophenotypes. This approach can not only boost detection power, but also reveal interesting biological pathways from genetic determinants to intermediate brain traits and to phenotypic AD outcomes. 
    more » « less
  2. Abstract Motivation

    Identifying the genetic basis of the brain structure, function and disorder by using the imaging quantitative traits (QTs) as endophenotypes is an important task in brain science. Brain QTs often change over time while the disorder progresses and thus understanding how the genetic factors play roles on the progressive brain QT changes is of great importance and meaning. Most existing imaging genetics methods only analyze the baseline neuroimaging data, and thus those longitudinal imaging data across multiple time points containing important disease progression information are omitted.


    We propose a novel temporal imaging genetic model which performs the multi-task sparse canonical correlation analysis (T-MTSCCA). Our model uses longitudinal neuroimaging data to uncover that how single nucleotide polymorphisms (SNPs) play roles on affecting brain QTs over the time. Incorporating the relationship of the longitudinal imaging data and that within SNPs, T-MTSCCA could identify a trajectory of progressive imaging genetic patterns over the time. We propose an efficient algorithm to solve the problem and show its convergence. We evaluate T-MTSCCA on 408 subjects from the Alzheimer’s Disease Neuroimaging Initiative database with longitudinal magnetic resonance imaging data and genetic data available. The experimental results show that T-MTSCCA performs either better than or equally to the state-of-the-art methods. In particular, T-MTSCCA could identify higher canonical correlation coefficients and capture clearer canonical weight patterns. This suggests that T-MTSCCA identifies time-consistent and time-dependent SNPs and imaging QTs, which further help understand the genetic basis of the brain QT changes over the time during the disease progression.

    Availability and implementation

    The software and simulation data are publicly available at

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    more » « less
  3. Summary

    Motivated by searching for associations between genetic variants and brain imaging phenotypes, the aim of this article is to develop a groupwise envelope model for multivariate linear regression in order to establish the association between both multivariate responses and covariates. The groupwise envelope model allows for both distinct regression coefficients and distinct error structures for different groups. Statistically, the proposed envelope model can dramatically improve efficiency of tests and of estimation. Theoretical properties of the proposed model are established. Numerical experiments as well as the analysis of an imaging genetic data set obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study show the effectiveness of the model in efficient estimation. Data used in preparation of this article were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI) database.

    more » « less
  4. Many existing studies on complex brain disorders, such as Alzheimer's Disease, usually employed regression analysis to associate the neuroimaging measures to cognitive status. However, whether these measures in multiple modalities have the predictive power to infer the trajectory of cognitive performance over time still remain under-explored. In this paper, we propose a high-order multi-modal multi-mask feature learning model to uncover temporal relationship between the longitudinal neuroimaging measures and progressive cognitive output scores. The regularizations through sparsity-induced norms implemented in the proposed learning model enable the selection of only a small number of imaging features over time and capture modality structures for multi-modal imaging markers. The promising experimental results in extensive empirical studies performed on the ADNI cohort have validated the effectiveness of the proposed method. 
    more » « less
  5. Brain imaging genomics is an emerging data science field, where integrated analysis of brain imaging and genomics data, often combined with other biomarker, clinical, and environmental data, is performed to gain new insights into the phenotypic, genetic, and molecular characteristics of the brain as well as their impact on normal and disordered brain function and behavior. It has enormous potential to contribute significantly to biomedical discoveries in brain science. Given the increasingly important role of statistical and machine learning in biomedicine and rapidly growing literature in brain imaging genomics, we provide an up-to-date and comprehensive review of statistical and machine learning methods for brain imaging genomics, as well as a practical discussion on method selection for various biomedical applications. 
    more » « less