skip to main content

Title: Methods for large‐scale single mediator hypothesis testing: Possible choices and comparisons

Mediation hypothesis testing for a large number of mediators is challenging due to the composite structure of the null hypothesis, (: effect of the exposure on the mediator after adjusting for confounders; : effect of the mediator on the outcome after adjusting for exposure and confounders). In this paper, we reviewed three classes of methods for large‐scale one at a time mediation hypothesis testing. These methods are commonly used for continuous outcomes and continuous mediators assuming there is no exposure‐mediator interaction so that the product has a causal interpretation as the indirect effect. The first class of methods ignores the impact of different structures under the composite null hypothesis, namely, (1) ; (2) ; and (3) . The second class of methods weights the reference distribution under each case of the null to form a mixture reference distribution. The third class constructs a composite test statistic using the threepvalues obtained under each case of the null so that the reference distribution of the composite statistic is approximately . In addition to these existing methods, we developed the Sobel‐comp method belonging to the second class, which uses a corrected mixture reference distribution for Sobel's test statistic. We performed extensive simulation studies to compare all six methods belonging to these three classes in terms of the false positive rates (FPRs) under the null hypothesis and the true positive rates under the alternative hypothesis. We found that the second class of methods which uses a mixture reference distribution could best maintain the FPRs at the nominal level under the null hypothesis and had the greatest true positive rates under the alternative hypothesis. We applied all methods to study the mediation mechanism of DNA methylation sites in the pathway from adult socioeconomic status to glycated hemoglobin level using data from the Multi‐Ethnic Study of Atherosclerosis (MESA). We provide guidelines for choosing the optimal mediation hypothesis testing method in practice and develop an R packagemedScanavailable on the CRAN for implementing all the six methods.

more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
Genetic Epidemiology
Date Published:
Journal Name:
Genetic Epidemiology
Page Range / eLocation ID:
167 to 184
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Low socioeconomic status (SES) and living in a disadvantaged neighborhood are associated with poor cardiovascular health. Multiple lines of evidence have linked DNA methylation to both cardiovascular risk factors and social disadvantage indicators. However, limited research has investigated the role of DNA methylation in mediating the associations of individual- and neighborhood-level disadvantage with multiple cardiovascular risk factors in large, multi-ethnic, population-based cohorts. We examined whether disadvantage at the individual level (childhood and adult SES) and neighborhood level (summary neighborhood SES as assessed by Census data and social environment as assessed by perceptions of aesthetic quality, safety, and social cohesion) were associated with 11 cardiovascular risk factors including measures of obesity, diabetes, lipids, and hypertension in 1,154 participants from the Multi-Ethnic Study of Atherosclerosis (MESA). For significant associations, we conducted epigenome-wide mediation analysis to identify methylation sites mediating the relationship between individual/neighborhood disadvantage and cardiovascular risk factors using the JT-Comp method that assesses sparse mediation effects under a composite null hypothesis. In models adjusting for age, sex, race/ethnicity, smoking, medication use, and genetic principal components of ancestry, epigenetic mediation was detected for the associations of adult SES with body mass index (BMI), insulin, and high-density lipoprotein cholesterol (HDL-C), as well as for the association between neighborhood socioeconomic disadvantage and HDL-C at FDRq< 0.05. The 410 CpG mediators identified for the SES-BMI association were enriched for CpGs associated with gene expression (expression quantitative trait methylation loci, or eQTMs), and corresponding genes were enriched in antigen processing and presentation pathways. For cardiovascular risk factors other than BMI, most of the epigenetic mediators lost significance after controlling for BMI. However, 43 methylation sites showed evidence of mediating the neighborhood socioeconomic disadvantage and HDL-C association after BMI adjustment. The identified mediators were enriched for eQTMs, and corresponding genes were enriched in inflammatory and apoptotic pathways. Our findings support the hypothesis that DNA methylation acts as a mediator between individual- and neighborhood-level disadvantage and cardiovascular risk factors, and shed light on the potential underlying epigenetic pathways. Future studies are needed to fully elucidate the biological mechanisms that link social disadvantage to poor cardiovascular health.

    more » « less
  2. Abstract

    Causal mediation analysis aims to examine the role of a mediator or a group of mediators that lie in the pathway between an exposure and an outcome. Recent biomedical studies often involve a large number of potential mediators based on high‐throughput technologies. Most of the current analytic methods focus on settings with one or a moderate number of potential mediators. With the expanding growth of ‐omics data, joint analysis of molecular‐level genomics data with epidemiological data through mediation analysis is becoming more common. However, such joint analysis requires methods that can simultaneously accommodate high‐dimensional mediators and that are currently lacking. To address this problem, we develop a Bayesian inference method using continuous shrinkage priors to extend previous causal mediation analysis techniques to a high‐dimensional setting. Simulations demonstrate that our method improves the power of global mediation analysis compared to simpler alternatives and has decent performance to identify true nonnull contributions to the mediation effects of the pathway. The Bayesian method also helps us to understand the structure of the composite null cases for inactive mediators in the pathway. We applied our method to Multi‐Ethnic Study of Atherosclerosis and identified DNA methylation regions that may actively mediate the effect of socioeconomic status on cardiometabolic outcomes.

    more » « less
  3. Abstract

    Causal mediation analysis aims to characterize an exposure's effect on an outcome and quantify the indirect effect that acts through a given mediator or a group of mediators of interest. With the increasing availability of measurements on a large number of potential mediators, like the epigenome or the microbiome, new statistical methods are needed to simultaneously accommodate high-dimensional mediators while directly target penalization of the natural indirect effect (NIE) for active mediator identification. Here, we develop two novel prior models for identification of active mediators in high-dimensional mediation analysis through penalizing NIEs in a Bayesian paradigm. Both methods specify a joint prior distribution on the exposure-mediator effect and mediator-outcome effect with either (a) a four-component Gaussian mixture prior or (b) a product threshold Gaussian prior. By jointly modelling the two parameters that contribute to the NIE, the proposed methods enable penalization on their product in a targeted way. Resultant inference can take into account the four-component composite structure underlying the NIE. We show through simulations that the proposed methods improve both selection and estimation accuracy compared to other competing methods. We applied our methods for an in-depth analysis of two ongoing epidemiologic studies: the Multi-Ethnic Study of Atherosclerosis (MESA) and the LIFECODES birth cohort. The identified active mediators in both studies reveal important biological pathways for understanding disease mechanisms.

    more » « less
  4. We propose a general method for constructing confidence sets and hypothesis tests that have finite-sample guarantees without regularity conditions. We refer to such procedures as “universal.” The method is very simple and is based on a modified version of the usual likelihood-ratio statistic that we call “the split likelihood-ratio test” (split LRT) statistic. The (limiting) null distribution of the classical likelihood-ratio statistic is often intractable when used to test composite null hypotheses in irregular statistical models. Our method is especially appealing for statistical inference in these complex setups. The method we suggest works for any parametric model and also for some nonparametric models, as long as computing a maximum-likelihood estimator (MLE) is feasible under the null. Canonical examples arise in mixture modeling and shape-constrained inference, for which constructing tests and confidence sets has been notoriously difficult. We also develop various extensions of our basic methods. We show that in settings when computing the MLE is hard, for the purpose of constructing valid tests and intervals, it is sufficient to upper bound the maximum likelihood. We investigate some conditions under which our methods yield valid inferences under model misspecification. Further, the split LRT can be used with profile likelihoods to deal with nuisance parameters, and it can also be run sequentially to yield anytime-valid P values and confidence sequences. Finally, when combined with the method of sieves, it can be used to perform model selection with nested model classes.

    more » « less
  5. We consider Bayesian high‐dimensional mediation analysis to identify among a large set of correlated potential mediators the active ones that mediate the effect from an exposure variable to an outcome of interest. Correlations among mediators are commonly observed in modern data analysis; examples include the activated voxels within connected regions in brain image data, regulatory signals driven by gene networks in genome data, and correlated exposure data from the same source. When correlations are present among active mediators, mediation analysis that fails to account for such correlation can be suboptimal and may lead to a loss of power in identifying active mediators. Building upon a recent high‐dimensional mediation analysis framework, we propose two Bayesian hierarchical models, one with a Gaussian mixture prior that enables correlated mediator selection and the other with a Potts mixture prior that accounts for the correlation among active mediators in mediation analysis. We develop efficient sampling algorithms for both methods. Various simulations demonstrate that our methods enable effective identification of correlated active mediators, which could be missed by using existing methods that assume prior independence among active mediators. The proposed methods are applied to the LIFECODES birth cohort and the Multi‐Ethnic Study of Atherosclerosis (MESA) and identified new active mediators with important biological implications.

    more » « less