skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Identification, Semiparametric Efficiency, and Quadruply Robust Estimation in Mediation Analysis with Treatment-Induced Confounding
Natural mediation effects are often of interest when the goal is to understand a causal mechanism. However, most existing methods and their identification assumptions preclude treatment-induced confounders often present in practice. To address this fundamental limitation, we provide a set of assumptions that identify the natural direct effect in the presence of treatment-induced confounders. Even when some of those assumptions are violated, the estimand still has an interventional direct effect interpretation. We derive the semiparametric efficiency bound for the estimand, which unlike usual expressions, contains conditional densities that are variational dependent. We consider a reparameterization and propose a quadruply robust estimator that remains consistent under four types of possible misspecification and is also locally semiparametric efficient. We use simulation studies to demonstrate the proposed method and study an application to the 2017 Natality data to investigate the effect of prenatal care on preterm birth mediated by preeclampsia with smoking status during pregnancy being a potential treatment-induced confounder. Supplementary materials for the article are available online.  more » « less
Award ID(s):
1711952
PAR ID:
10300663
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Journal of the American Statistical Association
ISSN:
0162-1459
Page Range / eLocation ID:
1 to 28
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In many medical and scientific settings, the choice of treatment or intervention may be de-termined by a covariate threshold. For example, elderly men may receive more thoroughdiagnosis if their prostate-specific antigen (PSA) level is high. In these cases, the causaltreatment effect is often of great interest, especially when there is a lack of evidence fromrandomized clinical trials. From the social science literature, a class of methods known asregression discontinuity (RD) designs can be used to estimate the treatment effect in thissituation. Under certain assumptions, such an estimand enjoys a causal interpretation. Weshow how to estimate causal effects under the regression discontinuity design for censoreddata. The proposed estimation procedure employs a class of censoring unbiased transfor-mations that includes inverse probability censored weighting and doubly robust transfor-mation schemes. Simulation studies are used to evaluate the finite-sample properties of theproposed estimator. We also illustrate the proposed method by evaluating the causal effectof PSA-dependent screening strategies 
    more » « less
  2. ABSTRACT Platform trials are multi‐arm designs that simultaneously evaluate multiple treatments for a single disease within the same overall trial structure. Unlike traditional randomized controlled trials, they allow treatment arms to enter and exit the trial at distinct times while maintaining a control arm throughout. This control arm comprises both concurrent controls, where participants are randomized concurrently to either the treatment or control arm, and non‐concurrent controls, who enter the trial when the treatment arm under study is unavailable. While flexible, platform trials introduce the challenge of using non‐concurrent controls, raising questions about estimating treatment effects. Specifically, which estimands should be targeted? Under what assumptions can these estimands be identified and estimated? Are there any efficiency gains? In this article, we discuss issues related to the identification and estimation assumptions of common choices of estimand. We conclude that the most robust strategy to increase efficiency without imposing unwarranted assumptions is to target the concurrent average treatment effect (cATE), the ATE among only concurrent units, using a covariate‐adjusted doubly robust estimator. Our studies suggest that, for the purpose of obtaining efficiency gains, collecting important prognostic variables is more important than relying on non‐concurrent controls. We also discuss the perils of targeting ATE due to an untestable extrapolation assumption that will often be invalid. We provide simulations illustrating our points and an application to the ACTT platform trial, resulting in a 20% improvement in precision compared to the naive estimator that ignores non‐concurrent controls and prognostic variables. 
    more » « less
  3. Abstract Propensity score weighting is a tool for causal inference to adjust for measured confounders in observational studies. In practice, data often present complex structures, such as clustering, which make propensity score modeling and estimation challenging. In addition, for clustered data, there may be unmeasured cluster-level covariates that are related to both the treatment assignment and outcome. When such unmeasured cluster-specific confounders exist and are omitted in the propensity score model, the subsequent propensity score adjustment may be biased. In this article, we propose a calibration technique for propensity score estimation under the latent ignorable treatment assignment mechanism, i. e., the treatment-outcome relationship is unconfounded given the observed covariates and the latent cluster-specific confounders. We impose novel balance constraints which imply exact balance of the observed confounders and the unobserved cluster-level confounders between the treatment groups. We show that the proposed calibrated propensity score weighting estimator is doubly robust in that it is consistent for the average treatment effect if either the propensity score model is correctly specified or the outcome follows a linear mixed effects model. Moreover, the proposed weighting method can be combined with sampling weights for an integrated solution to handle confounding and sampling designs for causal inference with clustered survey data. In simulation studies, we show that the proposed estimator is superior to other competitors. We estimate the effect of School Body Mass Index Screening on prevalence of overweight and obesity for elementary schools in Pennsylvania. 
    more » « less
  4. One dominant approach to evaluate the causal effect of a treatment is through panel data analysis, whereby the behaviors of multiple units are observed over time. The information across time and units motivates two general approaches: (i) horizontal regression (i.e., unconfoundedness), which exploits time series patterns, and (ii) vertical regression (e.g., synthetic controls), which exploits cross‐sectional patterns. Conventional wisdom often considers the two approaches to be different. We establish this position to be partly false for estimation but generally true for inference. In the absence of any assumptions, we show that both approaches yield algebraically equivalent point estimates for several standard estimators. However, the source of randomness assumed by each approach leads to a distinct estimand and quantification of uncertainty even for the same point estimate. This emphasizes that researchers should carefully consider where the randomness stems from in their data, as it has direct implications for the accuracy of inference. 
    more » « less
  5. Valid causal inference in observational studies often requires controlling for confounders. However, in practice measurements of confounders may be noisy, and can lead to biased estimates of causal effects. We show that we can reduce bias induced by measurement noise using a large number of noisy measurements of the underlying confounders. We propose the use of matrix factorization to infer the confounders from noisy covariates. This flexible and principled framework adapts to missing values, accommodates a wide variety of data types, and can enhance a wide variety of causal inference methods. We bound the error for the induced average treatment effect estimator and show it is consistent in a linear regression setting, using Exponential Family Matrix Completion preprocessing. We demonstrate the effectiveness of the proposed procedure in numerical experiments with both synthetic data and real clinical data. 
    more » « less