skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Adaptive bootstrap tests for composite null hypotheses in the mediation pathway analysis
Abstract Mediation analysis aims to assess if, and how, a certain exposure influences an outcome of interest through intermediate variables. This problem has recently gained a surge of attention due to the tremendous need for such analyses in scientific fields. Testing for the mediation effect (ME) is greatly challenged by the fact that the underlying null hypothesis (i.e. the absence of MEs) is composite. Most existing mediation tests are overly conservative and thus underpowered. To overcome this significant methodological hurdle, we develop an adaptive bootstrap testing framework that can accommodate different types of composite null hypotheses in the mediation pathway analysis. Applied to the product of coefficients test and the joint significance test, our adaptive testing procedures provide type I error control under the composite null, resulting in much improved statistical power compared to existing tests. Both theoretical properties and numerical examples of the proposed methodology are discussed.  more » « less
Award ID(s):
2150601 1846747
PAR ID:
10473988
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Volume:
86
Issue:
2
ISSN:
1369-7412
Format(s):
Medium: X Size: p. 411-434
Size(s):
p. 411-434
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Mediation hypothesis testing for a large number of mediators is challenging due to the composite structure of the null hypothesis, (: effect of the exposure on the mediator after adjusting for confounders; : effect of the mediator on the outcome after adjusting for exposure and confounders). In this paper, we reviewed three classes of methods for large‐scale one at a time mediation hypothesis testing. These methods are commonly used for continuous outcomes and continuous mediators assuming there is no exposure‐mediator interaction so that the product has a causal interpretation as the indirect effect. The first class of methods ignores the impact of different structures under the composite null hypothesis, namely, (1) ; (2) ; and (3) . The second class of methods weights the reference distribution under each case of the null to form a mixture reference distribution. The third class constructs a composite test statistic using the threepvalues obtained under each case of the null so that the reference distribution of the composite statistic is approximately . In addition to these existing methods, we developed the Sobel‐comp method belonging to the second class, which uses a corrected mixture reference distribution for Sobel's test statistic. We performed extensive simulation studies to compare all six methods belonging to these three classes in terms of the false positive rates (FPRs) under the null hypothesis and the true positive rates under the alternative hypothesis. We found that the second class of methods which uses a mixture reference distribution could best maintain the FPRs at the nominal level under the null hypothesis and had the greatest true positive rates under the alternative hypothesis. We applied all methods to study the mediation mechanism of DNA methylation sites in the pathway from adult socioeconomic status to glycated hemoglobin level using data from the Multi‐Ethnic Study of Atherosclerosis (MESA). We provide guidelines for choosing the optimal mediation hypothesis testing method in practice and develop an R packagemedScanavailable on the CRAN for implementing all the six methods. 
    more » « less
  2. Abstract Causal mediation analysis aims to examine the role of a mediator or a group of mediators that lie in the pathway between an exposure and an outcome. Recent biomedical studies often involve a large number of potential mediators based on high‐throughput technologies. Most of the current analytic methods focus on settings with one or a moderate number of potential mediators. With the expanding growth of ‐omics data, joint analysis of molecular‐level genomics data with epidemiological data through mediation analysis is becoming more common. However, such joint analysis requires methods that can simultaneously accommodate high‐dimensional mediators and that are currently lacking. To address this problem, we develop a Bayesian inference method using continuous shrinkage priors to extend previous causal mediation analysis techniques to a high‐dimensional setting. Simulations demonstrate that our method improves the power of global mediation analysis compared to simpler alternatives and has decent performance to identify true nonnull contributions to the mediation effects of the pathway. The Bayesian method also helps us to understand the structure of the composite null cases for inactive mediators in the pathway. We applied our method to Multi‐Ethnic Study of Atherosclerosis and identified DNA methylation regions that may actively mediate the effect of socioeconomic status on cardiometabolic outcomes. 
    more » « less
  3. Motivated by a multimodal neuroimaging study for Alzheimer's disease, in this article, we study the inference problem, that is, hypothesis testing, of sequential mediation analysis. The existing sequential mediation solutions mostly focus on sparse estimation, while hypothesis testing is an utterly different and more challenging problem. Meanwhile, the few mediation testing solutions often ignore the potential dependency among the mediators or cannot be applied to the sequential problem directly. We propose a statistical inference procedure to test mediation pathways when there are sequentially ordered multiple data modalities and each modality involves multiple mediators. We allow the mediators to be conditionally dependent and the number of mediators within each modality to diverge with the sample size. We produce the explicit significance quantification and establish theoretical guarantees in terms of asymptotic size, power, and false discovery control. We demonstrate the efficacy of the method through both simulations and an application to a multimodal neuroimaging pathway analysis of Alzheimer's disease. 
    more » « less
  4. In spite of its urgent importance in the era of big data, testing high-dimensional parameters in generalized linear models (GLMs) in the presence of high-dimensional nuisance parameters has been largely under-studied, especially with regard to constructing powerful tests for general (and unknown) alternatives. Most existing tests are powerful only against certain alternatives and may yield incorrect Type I error rates under high-dimensional nuisance parameter situations. In this paper, we propose the adaptive interaction sum of powered score (aiSPU) test in the framework of penalized regression with a non-convex penalty, called truncated Lasso penalty (TLP), which can maintain correct Type I error rates while yielding high statistical power across a wide range of alternatives. To calculate its p-values analytically, we derive its asymptotic null distribution. Via simulations, its superior finite-sample performance is demonstrated over several representative existing methods. In addition, we apply it and other representative tests to an Alzheimer’s Disease Neuroimaging Initiative (ADNI) data set, detecting possible gene-gender interactions for Alzheimer’s disease. We also put R package “aispu” implementing the proposed test on GitHub. 
    more » « less
  5. We consider the problem of sequential multiple hypothesis testing with nontrivial data collection costs. This problem appears, for example, when conducting biological experiments to identify differentially expressed genes of a disease process. This work builds on the generalized α-investing framework which enables control of the marginal false discovery rate in a sequential testing setting. We make a theoretical analysis of the long term asymptotic behavior of α-wealth which motivates a consideration of sample size in the α-investing decision rule. Posing the testing process as a game with nature, we construct a decision rule that optimizes the expected α-wealth reward (ERO) and provides an optimal sample size for each test. Empirical results show that a cost-aware ERO decision rule correctly rejects more false null hypotheses than other methods for $n=1$ where n is the sample size. When the sample size is not fixed cost-aware ERO uses a prior on the null hypothesis to adaptively allocate of the sample budget to each test. We extend cost-aware ERO investing to finite-horizon testing which enables the decision rule to allocate samples in a non-myopic manner. Finally, empirical tests on real data sets from biological experiments show that cost-aware ERO balances the allocation of samples to an individual test against the allocation of samples across multiple tests. 
    more » « less