skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 5:00 PM ET until 11:00 PM ET on Friday, June 21 due to maintenance. We apologize for the inconvenience.

Title: Bayesian Sparse Mediation Analysis with Targeted Penalization of Natural Indirect Effects

Causal mediation analysis aims to characterize an exposure's effect on an outcome and quantify the indirect effect that acts through a given mediator or a group of mediators of interest. With the increasing availability of measurements on a large number of potential mediators, like the epigenome or the microbiome, new statistical methods are needed to simultaneously accommodate high-dimensional mediators while directly target penalization of the natural indirect effect (NIE) for active mediator identification. Here, we develop two novel prior models for identification of active mediators in high-dimensional mediation analysis through penalizing NIEs in a Bayesian paradigm. Both methods specify a joint prior distribution on the exposure-mediator effect and mediator-outcome effect with either (a) a four-component Gaussian mixture prior or (b) a product threshold Gaussian prior. By jointly modelling the two parameters that contribute to the NIE, the proposed methods enable penalization on their product in a targeted way. Resultant inference can take into account the four-component composite structure underlying the NIE. We show through simulations that the proposed methods improve both selection and estimation accuracy compared to other competing methods. We applied our methods for an in-depth analysis of two ongoing epidemiologic studies: the Multi-Ethnic Study of Atherosclerosis (MESA) and the LIFECODES birth cohort. The identified active mediators in both studies reveal important biological pathways for understanding disease mechanisms.

more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series C: Applied Statistics
Page Range / eLocation ID:
p. 1391-1412
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider Bayesian high‐dimensional mediation analysis to identify among a large set of correlated potential mediators the active ones that mediate the effect from an exposure variable to an outcome of interest. Correlations among mediators are commonly observed in modern data analysis; examples include the activated voxels within connected regions in brain image data, regulatory signals driven by gene networks in genome data, and correlated exposure data from the same source. When correlations are present among active mediators, mediation analysis that fails to account for such correlation can be suboptimal and may lead to a loss of power in identifying active mediators. Building upon a recent high‐dimensional mediation analysis framework, we propose two Bayesian hierarchical models, one with a Gaussian mixture prior that enables correlated mediator selection and the other with a Potts mixture prior that accounts for the correlation among active mediators in mediation analysis. We develop efficient sampling algorithms for both methods. Various simulations demonstrate that our methods enable effective identification of correlated active mediators, which could be missed by using existing methods that assume prior independence among active mediators. The proposed methods are applied to the LIFECODES birth cohort and the Multi‐Ethnic Study of Atherosclerosis (MESA) and identified new active mediators with important biological implications.

    more » « less
  2. Abstract Background Traditional mediation analysis typically examines the relations among an intervention, a time-invariant mediator, and a time-invariant outcome variable. Although there may be a total effect of the intervention on the outcome, there is a need to understand the process by which the intervention affects the outcome (i.e., the indirect effect through the mediator). This indirect effect is frequently assumed to be time-invariant. With improvements in data collection technology, it is possible to obtain repeated assessments over time resulting in intensive longitudinal data. This calls for an extension of traditional mediation analysis to incorporate time-varying variables as well as time-varying effects. Methods We focus on estimation and inference for the time-varying mediation model, which allows mediation effects to vary as a function of time. We propose a two-step approach to estimate the time-varying mediation effect. Moreover, we use a simulation-based approach to derive the corresponding point-wise confidence band for the time-varying mediation effect. Results Simulation studies show that the proposed procedures perform well when comparing the confidence band and the true underlying model. We further apply the proposed model and the statistical inference procedure to data collected from a smoking cessation study. Conclusions We present a model for estimating time-varying mediation effects that allows both time-varying outcomes and mediators. Simulation-based inference is also proposed and implemented in a user-friendly R package. 
    more » « less
  3. Kutalik, Zoltán (Ed.)

    Epigenetic researchers often evaluate DNA methylation as a potential mediator of the effect of social/environmental exposures on a health outcome. Modern statistical methods for jointly evaluating many mediators have not been widely adopted. We compare seven methods for high-dimensional mediation analysis with continuous outcomes through both diverse simulations and analysis of DNAm data from a large multi-ethnic cohort in the United States, while providing an R package for their seamless implementation and adoption. Among the considered choices, the best-performing methods for detecting active mediators in simulations are the Bayesian sparse linear mixed model (BSLMM) and high-dimensional mediation analysis (HDMA); while the preferred methods for estimating the global mediation effect are high-dimensional linear mediation analysis (HILMA) and principal component mediation analysis (PCMA). We provide guidelines for epigenetic researchers on choosing the best method in practice and offer suggestions for future methodological development.

    more » « less
  4. Mediation analysis provides an attractive causal inference framework to decompose the total effect of an exposure on an outcome into natural direct effects and natural indirect effects acting through a mediator. For binary outcomes, mediation analysis methods have been developed using logistic regression when the binary outcome is rare. These methods will not hold in practice when a disease is common. In this paper, we develop mediation analysis methods that relax the rare disease assumption when using logistic regression. We calculate the natural direct and indirect effects for common diseases by exploiting the relationship between logit and probit models. Specifically, we derive closed‐form expressions for the natural direct and indirect effects on the odds ratio scale. Mediation models for both continuous and binary mediators are considered. We demonstrate through simulation that the proposed method performs well for common binary outcomes. We apply the proposed methods to analyze the Normative Aging Study to identify DNA methylation sites that are mediators of smoking behavior on the outcome of obstructed airway function.

    more » « less
  5. Abstract

    Mediation hypothesis testing for a large number of mediators is challenging due to the composite structure of the null hypothesis, (: effect of the exposure on the mediator after adjusting for confounders; : effect of the mediator on the outcome after adjusting for exposure and confounders). In this paper, we reviewed three classes of methods for large‐scale one at a time mediation hypothesis testing. These methods are commonly used for continuous outcomes and continuous mediators assuming there is no exposure‐mediator interaction so that the product has a causal interpretation as the indirect effect. The first class of methods ignores the impact of different structures under the composite null hypothesis, namely, (1) ; (2) ; and (3) . The second class of methods weights the reference distribution under each case of the null to form a mixture reference distribution. The third class constructs a composite test statistic using the threepvalues obtained under each case of the null so that the reference distribution of the composite statistic is approximately . In addition to these existing methods, we developed the Sobel‐comp method belonging to the second class, which uses a corrected mixture reference distribution for Sobel's test statistic. We performed extensive simulation studies to compare all six methods belonging to these three classes in terms of the false positive rates (FPRs) under the null hypothesis and the true positive rates under the alternative hypothesis. We found that the second class of methods which uses a mixture reference distribution could best maintain the FPRs at the nominal level under the null hypothesis and had the greatest true positive rates under the alternative hypothesis. We applied all methods to study the mediation mechanism of DNA methylation sites in the pathway from adult socioeconomic status to glycated hemoglobin level using data from the Multi‐Ethnic Study of Atherosclerosis (MESA). We provide guidelines for choosing the optimal mediation hypothesis testing method in practice and develop an R packagemedScanavailable on the CRAN for implementing all the six methods.

    more » « less