skip to main content

Title: Adaptive bootstrap tests for composite null hypotheses in the mediation pathway analysis

Mediation analysis aims to assess if, and how, a certain exposure influences an outcome of interest through intermediate variables. This problem has recently gained a surge of attention due to the tremendous need for such analyses in scientific fields. Testing for the mediation effect (ME) is greatly challenged by the fact that the underlying null hypothesis (i.e. the absence of MEs) is composite. Most existing mediation tests are overly conservative and thus underpowered. To overcome this significant methodological hurdle, we develop an adaptive bootstrap testing framework that can accommodate different types of composite null hypotheses in the mediation pathway analysis. Applied to the product of coefficients test and the joint significance test, our adaptive testing procedures provide type I error control under the composite null, resulting in much improved statistical power compared to existing tests. Both theoretical properties and numerical examples of the proposed methodology are discussed.

more » « less
Award ID(s):
2150601 1846747
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Medium: X Size: p. 411-434
["p. 411-434"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Mediation hypothesis testing for a large number of mediators is challenging due to the composite structure of the null hypothesis, (: effect of the exposure on the mediator after adjusting for confounders; : effect of the mediator on the outcome after adjusting for exposure and confounders). In this paper, we reviewed three classes of methods for large‐scale one at a time mediation hypothesis testing. These methods are commonly used for continuous outcomes and continuous mediators assuming there is no exposure‐mediator interaction so that the product has a causal interpretation as the indirect effect. The first class of methods ignores the impact of different structures under the composite null hypothesis, namely, (1) ; (2) ; and (3) . The second class of methods weights the reference distribution under each case of the null to form a mixture reference distribution. The third class constructs a composite test statistic using the threepvalues obtained under each case of the null so that the reference distribution of the composite statistic is approximately . In addition to these existing methods, we developed the Sobel‐comp method belonging to the second class, which uses a corrected mixture reference distribution for Sobel's test statistic. We performed extensive simulation studies to compare all six methods belonging to these three classes in terms of the false positive rates (FPRs) under the null hypothesis and the true positive rates under the alternative hypothesis. We found that the second class of methods which uses a mixture reference distribution could best maintain the FPRs at the nominal level under the null hypothesis and had the greatest true positive rates under the alternative hypothesis. We applied all methods to study the mediation mechanism of DNA methylation sites in the pathway from adult socioeconomic status to glycated hemoglobin level using data from the Multi‐Ethnic Study of Atherosclerosis (MESA). We provide guidelines for choosing the optimal mediation hypothesis testing method in practice and develop an R packagemedScanavailable on the CRAN for implementing all the six methods.

    more » « less
  2. In spite of its urgent importance in the era of big data, testing high-dimensional parameters in generalized linear models (GLMs) in the presence of high-dimensional nuisance parameters has been largely under-studied, especially with regard to constructing powerful tests for general (and unknown) alternatives. Most existing tests are powerful only against certain alternatives and may yield incorrect Type I error rates under high-dimensional nuisance parameter situations. In this paper, we propose the adaptive interaction sum of powered score (aiSPU) test in the framework of penalized regression with a non-convex penalty, called truncated Lasso penalty (TLP), which can maintain correct Type I error rates while yielding high statistical power across a wide range of alternatives. To calculate its p-values analytically, we derive its asymptotic null distribution. Via simulations, its superior finite-sample performance is demonstrated over several representative existing methods. In addition, we apply it and other representative tests to an Alzheimer’s Disease Neuroimaging Initiative (ADNI) data set, detecting possible gene-gender interactions for Alzheimer’s disease. We also put R package “aispu” implementing the proposed test on GitHub. 
    more » « less
  3. Summary

    We introduce an L2-type test for testing mutual independence and banded dependence structure for high dimensional data. The test is constructed on the basis of the pairwise distance covariance and it accounts for the non-linear and non-monotone dependences among the data, which cannot be fully captured by the existing tests based on either Pearson correlation or rank correlation. Our test can be conveniently implemented in practice as the limiting null distribution of the test statistic is shown to be standard normal. It exhibits excellent finite sample performance in our simulation studies even when the sample size is small albeit the dimension is high and is shown to identify non-linear dependence in empirical data analysis successfully. On the theory side, asymptotic normality of our test statistic is shown under quite mild moment assumptions and with little restriction on the growth rate of the dimension as a function of sample size. As a demonstration of good power properties for our distance-covariance-based test, we further show that an infeasible version of our test statistic has the rate optimality in the class of Gaussian distributions with equal correlation.

    more » « less
  4. Abstract

    In metagenomic studies, testing the association between microbiome composition and clinical outcomes translates to testing the nullity of variance components. Motivated by a lung human immunodeficiency virus (HIV) microbiome project, we study longitudinal microbiome data by using variance component models with more than two variance components. Current testing strategies only apply to models with exactly two variance components and when sample sizes are large. Therefore, they are not applicable to longitudinal microbiome studies. In this paper, we propose exact tests (score test, likelihood ratio test, and restricted likelihood ratio test) to (a) test the association of the overall microbiome composition in a longitudinal design and (b) detect the association of one specific microbiome cluster while adjusting for the effects from related clusters. Our approach combines the exact tests for null hypothesis with a single variance component with a strategy of reducing multiple variance components to a single one. Simulation studies demonstrate that our method has a correct type I error rate and superior power compared to existing methods at small sample sizes and weak signals. Finally, we apply our method to a longitudinal pulmonary microbiome study of HIV‐infected patients and reveal two interesting generaPrevotellaandVeillonellaassociated with forced vital capacity. Our findings shed light on the impact of the lung microbiome on HIV complexities. The method is implemented in the open‐source, high‐performance computing languageJuliaand is freely available at

    more » « less
  5. Abstract

    Causal mediation analysis aims to examine the role of a mediator or a group of mediators that lie in the pathway between an exposure and an outcome. Recent biomedical studies often involve a large number of potential mediators based on high‐throughput technologies. Most of the current analytic methods focus on settings with one or a moderate number of potential mediators. With the expanding growth of ‐omics data, joint analysis of molecular‐level genomics data with epidemiological data through mediation analysis is becoming more common. However, such joint analysis requires methods that can simultaneously accommodate high‐dimensional mediators and that are currently lacking. To address this problem, we develop a Bayesian inference method using continuous shrinkage priors to extend previous causal mediation analysis techniques to a high‐dimensional setting. Simulations demonstrate that our method improves the power of global mediation analysis compared to simpler alternatives and has decent performance to identify true nonnull contributions to the mediation effects of the pathway. The Bayesian method also helps us to understand the structure of the composite null cases for inactive mediators in the pathway. We applied our method to Multi‐Ethnic Study of Atherosclerosis and identified DNA methylation regions that may actively mediate the effect of socioeconomic status on cardiometabolic outcomes.

    more » « less