Human complex diseases are affected by both genetic and environmental factors. When multiple environmental risk factors are present, the interaction effect between a gene and the environmental mixture can be larger than the addition of individual interactions, resulting in the so-called synergistic gene–environment (GxE) interactions. Existing literature has shown the power of synergistic gene-environment interaction analysis with cross-sectional traits. In this work, we propose a functional varying index coefficient model for longitudinal traits together with multiple longitudinal environmental risk factors and assess how the genetic effects on a longitudinal disease trait are nonlinearly modified by a mixture of environmental influences. We derive an estimation procedure for the nonparametric functional varying index coefficients under the quadratic inference function and penalized spline framework. We evaluate some theoretical properties such as estimation consistency and asymptotic normality of the estimates. We further propose a hypothesis testing procedure to assess the significance of the synergistic GxE effect. The performance of the estimation and testing procedure is evaluated through Monte Carlo simulation studies. Finally, the utility of the method is illustrated by a real dataset from a pain sensitivity study in which SNP effects are nonlinearly modulated by a mixture of drug dosages and other environmental variables to affect patients’ blood pressure and heart rate.
more »
« less
High-Dimensional Gene–Environment Interaction Analysis
Beyond the main genetic and environmental effects, gene–environment (G–E) interactions have been demonstrated to significantly contribute to the development and progression of complex diseases. Published analyses of G–E interactions have primarily used a supervised framework to model both low-dimensional environmental factors and high-dimensional genetic factors in relation to disease outcomes. In this article, we aim to provide a selective review of methodological developments in G–E interaction analysis from a statistical perspective. The three main families of techniques are hypothesis testing, variable selection, and dimension reduction, which lead to three general frameworks: testing-based, estimation-based, and prediction-based. Linear- and nonlinear-effects analysis, fixed- and random-effects analysis, marginal and joint analysis, and Bayesian and frequentist analysis are reviewed to facilitate the conduct of interaction analysis in a wide range of situations with various assumptions and objectives. Statistical properties, computations, applications, and future directions are also discussed.
more »
« less
- Award ID(s):
- 2209685
- PAR ID:
- 10598510
- Publisher / Repository:
- Annual Reviews
- Date Published:
- Journal Name:
- Annual Review of Statistics and Its Application
- ISSN:
- 2326-8298
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Multiple types of molecular (genetic, genomic, epigenetic, etc.) measurements, environmental risk factors, and their interactions have been found to contribute to the outcomes and phenotypes of complex diseases. In each of the previous studies, only the interactions between one type of molecular measurement and environmental risk factors have been analyzed. In recent biomedical studies, multidimensional profiling, in which data from multiple types of molecular measurements are collected from the same subjects, is becoming popular. A myriad of recent studies have shown that collectively analyzing multiple types of molecular measurements is not only biologically sensible but also leads to improved estimation and prediction. In this study, we conduct an M-E interaction analysis, with M standing for multidimensional molecular measurements and E standing for environmental risk factors. This can accommodate multiple types of molecular measurements and sufficiently account for their overlapping as well as independent information. Extensive simulation shows that it outperforms several closely related alternatives. In the analysis of TCGA (The Cancer Genome Atlas) data on lung adenocarcinoma and cutaneous melanoma, we make some stable biological findings and achieve stable prediction.more » « less
-
Abstract Genotype-by-environment (G×E) interactions can significantly affect crop performance and stability. Investigating G×E requires extensive data sets with diverse cultivars tested over multiple locations and years. The Genomes-to-Fields (G2F) Initiative has tested maize hybrids in more than 130 year-locations in North America since 2014. Here, we curate and expand this data set by generating environmental covariates (using a crop model) for each of the trials. The resulting data set includes DNA genotypes and environmental data linked to more than 70,000 phenotypic records of grain yield and flowering traits for more than 4000 hybrids. We show how this valuable data set can serve as a benchmark in agricultural modeling and prediction, paving the way for countless G×E investigations in maize. We use multivariate analyses to characterize the data set’s genetic and environmental structure, study the association of key environmental factors with traits, and provide benchmarks using genomic prediction models.more » « less
-
The genotype-to-phenotype problem (G2P) for multicellular development asks how genetic inputs control collective phenotypic outputs. However, this is a challenging problem due to gene redundancy and stochasticity, causing mutations to have subtle phenotypic effects and replicates to display significant variation. We approach this problem using the model organism Myxococcus xanthus, a motile self-organizing bacterium that forms three-dimensional cell aggregates that mature into spore-filled fruiting bodies when under starvation stress. We develop a high-throughput imaging method using three-dimensional-printed microscopes to efficiently collect large phenotypic datasets. Our automated methods for analysis and visualization produce a map of phenotypic variation in M. xanthus development. We demonstrate that even subtle effects on developmental dynamics caused by mutation can be identified, discriminated, characterized, and given statistical significance, with implications for future gene annotation studies and the effect of environmental factors on G2P.more » « less
-
The explosion of biobank data offers unprecedented opportunities for gene-environment interaction (GxE) studies of complex diseases because of the large sample sizes and the rich collection in genetic and non-genetic information. However, the extremely large sample size also introduces new computational challenges in G×E assessment, especially for set-based G×E variance component (VC) tests, which are a widely used strategy to boost overall G×E signals and to evaluate the joint G×E effect of multiple variants from a biologically meaningful unit (e.g., gene). In this work, we focus on continuous traits and present SEAGLE, a S calable E xact A l G orithm for L arge-scale set-based G× E tests, to permit G×E VC tests for biobank-scale data. SEAGLE employs modern matrix computations to calculate the test statistic and p -value of the GxE VC test in a computationally efficient fashion, without imposing additional assumptions or relying on approximations. SEAGLE can easily accommodate sample sizes in the order of 10 5 , is implementable on standard laptops, and does not require specialized computing equipment. We demonstrate the performance of SEAGLE using extensive simulations. We illustrate its utility by conducting genome-wide gene-based G×E analysis on the Taiwan Biobank data to explore the interaction of gene and physical activity status on body mass index.more » « less
An official website of the United States government

