skip to main content

Title: Meta‐analysis of gene‐environment interaction exploiting gene‐environment independence across multiple case‐control studies

Multiple papers have studied the use of gene‐environment (GE) independence to enhance power for testing gene‐environment interaction in case‐control studies. However, studies that evaluate the role ofGEindependence in a meta‐analysis framework are limited. In this paper, we extend the single‐study empirical Bayes type shrinkage estimators proposed by Mukherjee and Chatterjee (2008) to a meta‐analysis setting that adjusts for uncertainty regarding the assumption ofGEindependence across studies. We use the retrospective likelihood framework to derive an adaptive combination of estimators obtained under the constrained model (assumingGEindependence) and unconstrained model (without assumptions ofGEindependence) with weights determined by measures ofGEassociation derived from multiple studies. Our simulation studies indicate that this newly proposed estimator has improved average performance across different simulation scenarios than the standard alternative of using inverse variance (covariance) weighted estimators that combines study‐specific constrained, unconstrained, or empirical Bayes estimators. The results are illustrated by meta‐analyzing 6 different studies of type 2 diabetes investigating interactions between genetic markers on the obesity relatedFTOgene and environmental factors body mass index and age.

more » « less
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Statistics in Medicine
Medium: X Size: p. 3895-3909
["p. 3895-3909"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Joint effects of genetic and environmental factors have been increasingly recognized in the development of many complex human diseases. Despite the popularity of case‐control and case‐only designs, longitudinal cohort studies that can capture time‐varying outcome and exposure information have long been recommended for gene–environment (G × E) interactions. To date, literature on sampling designs for longitudinal studies of G × E interaction is quite limited. We therefore consider designs that can prioritize a subsample of the existing cohort for retrospective genotyping on the basis of currently available outcome, exposure, and covariate data. In this work, we propose stratified sampling based on summaries of individual exposures and outcome trajectories and develop a full conditional likelihood approach for estimation that adjusts for the biased sample. We compare the performance of our proposed design and analysis with combinations of different sampling designs and estimation approaches via simulation. We observe that the full conditional likelihood provides improved estimates for the G × E interaction and joint exposure effects over uncorrected complete‐case analysis, and the exposure enriched outcome trajectory dependent design outperforms other designs in terms of estimation efficiency and power for detection of the G × E interaction. We also illustrate our design and analysis using data from the Normative Aging Study, an ongoing longitudinal cohort study initiated by the Veterans Administration in 1963. Copyright © 2017 John Wiley & Sons, Ltd.

    more » « less
  2. Summary

    Finding rare variants and gene–environment interactions (GXE) is critical in dissecting complex diseases. We consider the problem of detecting GXE where G is a rare haplotype and E is a nongenetic factor. Such methods typically assume G-E independence, which may not hold in many applications. A pertinent example is lung cancer—there is evidence that variants on Chromosome 15q25.1 interact with smoking to affect the risk. However, these variants are associated with smoking behavior rendering the assumption of G-E independence inappropriate. With the motivation of detecting GXE under G-E dependence, we extend an existing approach, logistic Bayesian LASSO, which assumes G-E independence (LBL-GXE-I) by modeling G-E dependence through a multinomial logistic regression (referred to as LBL-GXE-D). Unlike LBL-GXE-I, LBL-GXE-D controls type I error rates in all situations; however, it has reduced power when G-E independence holds. To control type I error without sacrificing power, we further propose a unified approach, LBL-GXE, to incorporate uncertainty in the G-E independence assumption by employing a reversible jump Markov chain Monte Carlo method. Our simulations show that LBL-GXE has power similar to that of LBL-GXE-I when G-E independence holds, yet has well-controlled type I errors in all situations. To illustrate the utility of LBL-GXE, we analyzed a lung cancer dataset and found several significant interactions in the 15q25.1 region, including one between a specific rare haplotype and smoking.

    more » « less
  3. Abstract

    Ecological research has increasingly highlighted the importance of intraspecific variation in shaping the structure and function of communities and ecosystems. Indeed, the effects of intraspecific variation can match or exceed those of interspecific variation. Previous reviews of intraspecific variation in plant traits across heterogeneous environments have focused primarily onmeanphenotypic effects. We propose that a richer and fuller understanding of the ecological causes and consequences of intraspecific variation would be provided by partitioning traitvarianceinto its subcomponents (genetic, environment, genotype by environment interaction).

    We used a meta‐analysis of 352 sets of genetic, environment and genotype by environment (G×E) variation estimates from 72 studies of Salicaceae to compare these sources of variation across plant traits (growth, foliar nitrogen, defence compounds), insect herbivore performance metrics (e.g., survival, growth, fecundity) and environmental conditions (e.g., soil nutrients, water, defoliation).

    Our findings revealed that variation in levels of defence compounds (both condensed tannins and salicinoids) and insect herbivore performance were primarily genetically determined, while variation in plant growth and foliar nitrogen was more environmentally determined.

    Plasticity in plant growth, foliar nitrogen levels and insect herbivore performance varied substantially across different sites (year × location), and nutrient, water and carbon dioxide environments. Plasticity was lowest for chemical defence traits and all traits in contrasting ozone and defoliation environments.

    Our quantitative review also revealed several gaps in the literature, including a need for surveying more mature plants, a wider variety of insect herbivore species (e.g., leaf‐galling insects, specialist insects) and underrepresented environmental treatments (e.g., competition, defoliation, disease, light and water availability).

    Findings from this analysis highlight the importance of, and patterns within, intraspecific variation with respect to shaping the evolvability and plasticity of traits and governing the interactions of plants and insects.

    Aplain language summaryis available for this article.

    more » « less
  4. The explosion of biobank data offers unprecedented opportunities for gene-environment interaction (GxE) studies of complex diseases because of the large sample sizes and the rich collection in genetic and non-genetic information. However, the extremely large sample size also introduces new computational challenges in G×E assessment, especially for set-based G×E variance component (VC) tests, which are a widely used strategy to boost overall G×E signals and to evaluate the joint G×E effect of multiple variants from a biologically meaningful unit (e.g., gene). In this work, we focus on continuous traits and present SEAGLE, a S calable E xact A l G orithm for L arge-scale set-based G× E tests, to permit G×E VC tests for biobank-scale data. SEAGLE employs modern matrix computations to calculate the test statistic and p -value of the GxE VC test in a computationally efficient fashion, without imposing additional assumptions or relying on approximations. SEAGLE can easily accommodate sample sizes in the order of 10 5 , is implementable on standard laptops, and does not require specialized computing equipment. We demonstrate the performance of SEAGLE using extensive simulations. We illustrate its utility by conducting genome-wide gene-based G×E analysis on the Taiwan Biobank data to explore the interaction of gene and physical activity status on body mass index. 
    more » « less
  5. Abstract Motivation

    Recent advances in biomedical research have made massive amount of transcriptomic data available in public repositories from different sources. Due to the heterogeneity present in the individual experiments, identifying reproducible biomarkers for a given disease from multiple independent studies has become a major challenge. The widely used meta-analysis approaches, such as Fisher’s method, Stouffer’s method, minP and maxP, have at least two major limitations: (i) they are sensitive to outliers, and (ii) they perform only one statistical test for each individual study, and hence do not fully utilize the potential sample size to gain statistical power.


    Here, we propose a gene-level meta-analysis framework that overcomes these limitations and identifies a gene signature that is reliable and reproducible across multiple independent studies of a given disease. The approach provides a comprehensive global signature that can be used to understand the underlying biological phenomena, and a smaller test signature that can be used to classify future samples of a given disease. We demonstrate the utility of the framework by constructing disease signatures for influenza and Alzheimer’s disease using nine datasets including 1108 individuals. These signatures are then validated on 12 independent datasets including 912 individuals. The results indicate that the proposed approach performs better than the majority of the existing meta-analysis approaches in terms of both sensitivity as well as specificity. The proposed signatures could be further used in diagnosis, prognosis and identification of therapeutic targets.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

    more » « less