skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: High-Dimensional Gene–Environment Interaction Analysis
Beyond the main genetic and environmental effects, gene–environment (G–E) interactions have been demonstrated to significantly contribute to the development and progression of complex diseases. Published analyses of G–E interactions have primarily used a supervised framework to model both low-dimensional environmental factors and high-dimensional genetic factors in relation to disease outcomes. In this article, we aim to provide a selective review of methodological developments in G–E interaction analysis from a statistical perspective. The three main families of techniques are hypothesis testing, variable selection, and dimension reduction, which lead to three general frameworks: testing-based, estimation-based, and prediction-based. Linear- and nonlinear-effects analysis, fixed- and random-effects analysis, marginal and joint analysis, and Bayesian and frequentist analysis are reviewed to facilitate the conduct of interaction analysis in a wide range of situations with various assumptions and objectives. Statistical properties, computations, applications, and future directions are also discussed.  more » « less
Award ID(s):
2209685
PAR ID:
10598510
Author(s) / Creator(s):
; ;
Publisher / Repository:
Annual Reviews
Date Published:
Journal Name:
Annual Review of Statistics and Its Application
ISSN:
2326-8298
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Human complex diseases are affected by both genetic and environmental factors. When multiple environmental risk factors are present, the interaction effect between a gene and the environmental mixture can be larger than the addition of individual interactions, resulting in the so-called synergistic gene–environment (GxE) interactions. Existing literature has shown the power of synergistic gene-environment interaction analysis with cross-sectional traits. In this work, we propose a functional varying index coefficient model for longitudinal traits together with multiple longitudinal environmental risk factors and assess how the genetic effects on a longitudinal disease trait are nonlinearly modified by a mixture of environmental influences. We derive an estimation procedure for the nonparametric functional varying index coefficients under the quadratic inference function and penalized spline framework. We evaluate some theoretical properties such as estimation consistency and asymptotic normality of the estimates. We further propose a hypothesis testing procedure to assess the significance of the synergistic GxE effect. The performance of the estimation and testing procedure is evaluated through Monte Carlo simulation studies. Finally, the utility of the method is illustrated by a real dataset from a pain sensitivity study in which SNP effects are nonlinearly modulated by a mixture of drug dosages and other environmental variables to affect patients’ blood pressure and heart rate. 
    more » « less
  2. The statistical practice of modeling interaction with two linear main effects and a product term is ubiquitous in the statistical and epidemiological literature. Most data modelers are aware that the misspecification of main effects can potentially cause severe type I error inflation in tests for interactions, leading to spurious detection of interactions. However, modeling practice has not changed. In this article, we focus on the specific situation where the main effects in the model are misspecified as linear terms and characterize its impact on common tests for statistical interaction. We then propose some simple alternatives that fix the issue of potential type I error inflation in testing interaction due to main effect misspecification. We show that when using the sandwich variance estimator for a linear regression model with a quantitative outcome and two independent factors, both the Wald and score tests asymptotically maintain the correct type I error rate. However, if the independence assumption does not hold or the outcome is binary, using the sandwich estimator does not fix the problem. We further demonstrate that flexibly modeling the main effect under a generalized additive model can largely reduce or often remove bias in the estimates and maintain the correct type I error rate for both quantitative and binary outcomes regardless of the independence assumption. We show, under the independence assumption and for a continuous outcome, overfitting and flexibly modeling the main effects does not lead to power loss asymptotically relative to a correctly specified main effect model. Our simulation study further demonstrates the empirical fact that using flexible models for the main effects does not result in a significant loss of power for testing interaction in general. Our results provide an improved understanding of the strengths and limitations for tests of interaction in the presence of main effect misspecification. Using data from a large biobank study “The Michigan Genomics Initiative”, we present two examples of interaction analysis in support of our results. 
    more » « less
  3. Abstract Multiple types of molecular (genetic, genomic, epigenetic, etc.) measurements, environmental risk factors, and their interactions have been found to contribute to the outcomes and phenotypes of complex diseases. In each of the previous studies, only the interactions between one type of molecular measurement and environmental risk factors have been analyzed. In recent biomedical studies, multidimensional profiling, in which data from multiple types of molecular measurements are collected from the same subjects, is becoming popular. A myriad of recent studies have shown that collectively analyzing multiple types of molecular measurements is not only biologically sensible but also leads to improved estimation and prediction. In this study, we conduct an M–E interaction analysis, with M standing for multidimensional molecular measurements and E standing for environmental risk factors. This can accommodate multiple types of molecular measurements and sufficiently account for their overlapping as well as independent information. Extensive simulation shows that it outperforms several closely related alternatives. In the analysis of TCGA (The Cancer Genome Atlas) data on lung adenocarcinoma and cutaneous melanoma, we make some stable biological findings and achieve stable prediction. 
    more » « less
  4. Abstract Genomic regions containing loci with effect sizes that interact with environmental factors are desirable targets for selection because of increasingly unpredictable growing seasons. Although selecting upon such gene‐by‐environment (G × E) loci is vital, identifying significantly associated loci is challenging due to the multiple testing correction. Consequently, G × E loci of small‐ to moderate effect sizes may never be identified via traditional genome‐wide association studies (GWAS). Variance GWAS (vGWAS) have been previously shown to identify G × E loci. Combined with its inherent reduction in the severity of multiple testing, we hypothesized that vGWAS could be successfully used to identify genomic regions likely to contain G × E effects. We used publicly available genotypic and phenotypic data in maize (Zea maysL.) to test the ability of two vGWAS approaches to identify G × E loci controlling two flowering traits. We observed high inflation of from both approaches. This suggests that these two vGWAS approaches are not suitable to the task of identifying G × E loci. We advocate that similar future applications of vGWAS use more sophisticated models that can adequately control the inflation of . Otherwise, the application of vGWAS to search for G × E effects that are critical for combating the effects of climate change will not reach its full potential. 
    more » « less
  5. The genotype-to-phenotype problem (G2P) for multicellular development asks how genetic inputs control collective phenotypic outputs. However, this is a challenging problem due to gene redundancy and stochasticity, causing mutations to have subtle phenotypic effects and replicates to display significant variation. We approach this problem using the model organism Myxococcus xanthus, a motile self-organizing bacterium that forms three-dimensional cell aggregates that mature into spore-filled fruiting bodies when under starvation stress. We develop a high-throughput imaging method using three-dimensional-printed microscopes to efficiently collect large phenotypic datasets. Our automated methods for analysis and visualization produce a map of phenotypic variation in M. xanthus development. We demonstrate that even subtle effects on developmental dynamics caused by mutation can be identified, discriminated, characterized, and given statistical significance, with implications for future gene annotation studies and the effect of environmental factors on G2P. 
    more » « less