skip to main content

Title: Simulation‐based estimators of analytically intractable causal effects

In causal inference problems, one is often tasked with estimating causal effects which are analytically intractable functionals of the data‐generating mechanism. Relevant settings include estimating intention‐to‐treat effects in longitudinal problems with missing data or computing direct and indirect effects in mediation analysis. One approach to computing these effects is to use theg‐formula implemented via Monte Carlo integration; when simulation‐based methods such as the nonparametric bootstrap or Markov chain Monte Carlo are used for inference, Monte Carlo integration must be nested within an already computationally intensive algorithm. We develop a widely‐applicable approach to accelerating this Monte Carlo integration step which greatly reduces the computational burden of existingg‐computation algorithms. We refer to our method as acceleratedg‐computation (AGC). The algorithms we present are similar in spirit to multiple imputation, but require removing within‐imputation variance from the standard error rather than adding it. We illustrate the use of AGC on a mediation analysis problem using a beta regression model and in a longitudinal clinical trial subject to nonignorable missingness using a Bayesian additive regression trees model.

more » « less
Author(s) / Creator(s):
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Medium: X Size: p. 1001-1017
["p. 1001-1017"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Missing data is inevitable in longitudinal clinical trials. Conventionally, the missing at random assumption is assumed to handle missingness, which however is unverifiable empirically. Thus, sensitivity analyses are critically important to assess the robustness of the study conclusions against untestable assumptions. Toward this end, regulatory agencies and the pharmaceutical industry use sensitivity models such as return-to-baseline, control-based, and washout imputation, following the ICH E9(R1) guidance. Multiple imputation is popular in sensitivity analyses; however, it may be inefficient and result in an unsatisfying interval estimation by Rubin’s combining rule. We propose distributional imputation in sensitivity analysis, which imputes each missing value by samples from its target imputation model given the observed data. Drawn on the idea of Monte Carlo integration, the distributional imputation estimator solves the mean estimating equations of the imputed dataset. It is fully efficient with theoretical guarantees. Moreover, we propose weighted bootstrap to obtain a consistent variance estimator, taking into account the variabilities due to model parameter estimation and target parameter estimation. The superiority of the distributional imputation framework is validated in the simulation study and an antidepressant longitudinal clinical trial.

    more » « less
  2. Growth curve models have been widely used to analyse longitudinal data in social and behavioural sciences. Although growth curve models with normality assumptions are relatively easy to estimate, practical data are rarely normal. Failing to account for non‐normal data may lead to unreliable model estimation and misleading statistical inference. In this work, we propose a robust approach for growth curve modelling using conditional medians that are less sensitive to outlying observations. Bayesian methods are applied for model estimation and inference. Based on the existing work on Bayesian quantile regression using asymmetric Laplace distributions, we use asymmetric Laplace distributions to convert the problem of estimating a median growth curve model into a problem of obtaining the maximum likelihood estimator for a transformed model. Monte Carlo simulation studies have been conducted to evaluate the numerical performance of the proposed approach with data containing outliers or leverage observations. The results show that the proposed approach yields more accurate and efficient parameter estimates than traditional growth curve modelling. We illustrate the application of our robust approach using conditional medians based on a real data set from the Virginia Cognitive Aging Project.

    more » « less
  3. Abstract Background Traditional mediation analysis typically examines the relations among an intervention, a time-invariant mediator, and a time-invariant outcome variable. Although there may be a total effect of the intervention on the outcome, there is a need to understand the process by which the intervention affects the outcome (i.e., the indirect effect through the mediator). This indirect effect is frequently assumed to be time-invariant. With improvements in data collection technology, it is possible to obtain repeated assessments over time resulting in intensive longitudinal data. This calls for an extension of traditional mediation analysis to incorporate time-varying variables as well as time-varying effects. Methods We focus on estimation and inference for the time-varying mediation model, which allows mediation effects to vary as a function of time. We propose a two-step approach to estimate the time-varying mediation effect. Moreover, we use a simulation-based approach to derive the corresponding point-wise confidence band for the time-varying mediation effect. Results Simulation studies show that the proposed procedures perform well when comparing the confidence band and the true underlying model. We further apply the proposed model and the statistical inference procedure to data collected from a smoking cessation study. Conclusions We present a model for estimating time-varying mediation effects that allows both time-varying outcomes and mediators. Simulation-based inference is also proposed and implemented in a user-friendly R package. 
    more » « less
  4. Abstract

    This paper presents Granger mediation analysis, a new framework for causal mediation analysis of multiple time series. This framework is motivated by a functional magnetic resonance imaging (fMRI) experiment where we are interested in estimating the mediation effects between a randomized stimulus time series and brain activity time series from two brain regions. The independent observation assumption is thus unrealistic for this type of time-series data. To address this challenge, our framework integrates two types of models: causal mediation analysis across the mediation variables, and vector autoregressive (VAR) models across the temporal observations. We use “Granger” to refer to VAR correlations modeled in this paper. We further extend this framework to handle multilevel data, in order to model individual variability and correlated errors between the mediator and the outcome variables. Using Rubin's potential outcome framework, we show that the causal mediation effects are identifiable under our time-series model. We further develop computationally efficient algorithms to maximize our likelihood-based estimation criteria. Simulation studies show that our method reduces the estimation bias and improves statistical power, compared with existing approaches. On a real fMRI data set, our approach quantifies the causal effects through a brain pathway, while capturing the dynamic dependence between two brain regions.

    more » « less
  5. There is substantial interest in assessing how exposure to environmental mixtures, such as chemical mixtures, affects child health. Researchers are also interested in identifying critical time windows of susceptibility to these complex mixtures. A recently developed method, called lagged kernel machine regression (LKMR), simultaneously accounts for these research questions by estimating the effects of time‐varying mixture exposures and by identifying their critical exposure windows. However, LKMR inference using Markov chain Monte Carlo (MCMC) methods (MCMC‐LKMR) is computationally burdensome and time intensive for large data sets, limiting its applicability. Therefore, we develop a mean field variational approximation method for Bayesian inference (MFVB) procedure for LKMR (MFVB‐LKMR). The procedure achieves computational efficiency and reasonable accuracy as compared with the corresponding MCMC estimation method. Updating parameters using MFVB may only take minutes, whereas the equivalent MCMC method may take many hours or several days. We apply MFVB‐LKMR to Programming Research in Obesity, Growth, Environment and Social Stressors (PROGRESS), a prospective cohort study in Mexico City. Results from a subset of PROGRESS using MFVB‐LKMR provide evidence of significant and positive association between second trimester cobalt levels andz‐scored birth weight. This positive association is heightened by cesium exposure. MFVB‐LKMR is a promising approach for computationally efficient analysis of environmental health data sets, to identify critical windows of exposure to complex mixtures.

    more » « less