Summary A common concern when trying to draw causal inferences from observational data is that the measured covariates are insufficiently rich to account for all sources of confounding. In practice, many of the covariates may only be proxies of the latent confounding mechanism. Recent work has shown that in certain settings where the standard no-unmeasured-confounding assumption fails, proxy variables can be leveraged to identify causal effects. Results currently exist for the total causal effect of an intervention, but little consideration has been given to learning about the direct or indirect pathways of the effect through a mediator variable. In this work, we describe three separate proximal identification results for natural direct and indirect effects in the presence of unmeasured confounding. We then develop a semiparametric framework for inference on natural direct and indirect effects, which leads us to locally efficient, multiply robust estimators.
more »
« less
A surrogate endpoint-based provisional approval causal roadmap, illustrated by vaccine development
For many rare diseases with no approved preventive interventions, promising interventions exist. However, it has proven difficult to conduct a pivotal phase 3 trial that could provide direct evidence demonstrating a beneficial effect of the intervention on the target disease outcome. When a promising putative surrogate endpoint(s) for the target outcome is available, surrogate-based provisional approval of an intervention may be pursued. Following the general Causal Roadmap rubric, we describe a surrogate endpoint-based provisional approval causal roadmap. Based on an observational study data set and a phase 3 randomized trial data set, this roadmap defines an approach to analyze the combined data set to draw a conservative inference about the treatment effect (TE) on the target outcome in the phase 3 study population. The observational study enrolls untreated individuals and collects baseline covariates, surrogate endpoints, and the target outcome, and is used to estimate the surrogate index—the regression of the target outcome on the surrogate endpoints and baseline covariates. The phase 3 trial randomizes participants to treated vs. untreated and collects the same data but is much smaller and hence very underpowered to directly assess TE, such that inference on TE is based on the surrogate index. This inference is made conservative by specifying 2 bias functions: one that expresses an imperfection of the surrogate index as a surrogate endpoint in the phase 3 study, and the other that expresses imperfect transport of the surrogate index in the untreated from the observational to the phase 3 study. Plug-in and nonparametric efficient one-step estimators of TE, with inferential procedures, are developed. The finite-sample performance of the estimators is evaluated in simulation studies. The causal roadmap is motivated by and illustrated with contemporary Group B Streptococcus vaccine development.
more »
« less
- Award ID(s):
- 2149492
- PAR ID:
- 10650350
- Publisher / Repository:
- Biostatistics
- Date Published:
- Journal Name:
- Biostatistics
- Volume:
- 26
- Issue:
- 1
- ISSN:
- 1468-4357
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We consider causal inference for observational studies with data spread over two files. One file includes the treatment, outcome, and some covariates measured on a set of individuals, and the other file includes additional causally-relevant covariates measured on a partially overlapping set of individuals. By linking records in the two databases, the analyst can control for more covariates, thereby reducing the risk of bias compared to using only one file alone. When analysts do not have access to a unique identifier that enables perfect, error-free linkages, they typically rely on probabilistic record linkage to construct a single linked data set, and estimate causal effects using these linked data. This typical practice does not propagate uncertainty from imperfect linkages to the causal inferences. Further, it does not take advantage of relationships among the variables to improve the linkage quality. We address these shortcomings by fusing regression-assisted, Bayesian probabilistic record linkage with causal inference. The Markov chain Monte Carlo sampler generates multiple plausible linked data files as byproducts that analysts can use for multiple imputation inferences. Here, we show results for two causal estimators based on propensity score overlap weights. Using simulations and data from the Italy Survey on Household Income and Wealth, we show that our approach can improve the accuracy of estimated treatment effects.more » « less
-
In the absence of data from a randomized trial, researchers may aim to use observational data to draw causal inference about the effect of a treatment on a time-to-event outcome. In this context, interest often focuses on the treatment-specific survival curves, that is, the survival curves were the population under study to be assigned to receive the treatment or not. Under certain conditions, including that all confounders of the treatment-outcome relationship are observed, the treatment-specific survival curve can be identified with a covariate-adjusted survival curve. In this article, we propose a novel cross-fitted doubly-robust estimator that incorporates data-adaptive (e.g. machine learning) estimators of the conditional survival functions. We establish conditions on the nuisance estimators under which our estimator is consistent and asymptotically linear, both pointwise and uniformly in time. We also propose a novel ensemble learner for combining multiple candidate estimators of the conditional survival estimators. Notably, our methods and results accommodate events occurring in discrete or continuous time, or an arbitrary mix of the two. We investigate the practical performance of our methods using numerical studies and an application to the effect of a surgical treatment to prevent metastases of parotid carcinoma on mortality.more » « less
-
Abstract Post-treatment variables often complicate causal inference. They appear in many scientific problems, including non-compliance, truncation by death, mediation, and surrogate endpoint evaluation. Principal stratification is a strategy to address these challenges by adjusting for the potential values of the post-treatment variables, defined as the principal strata. It allows for characterizing treatment effect heterogeneity across principal strata and unveiling the mechanism of the treatment’s impact on the outcome related to post-treatment variables. However, the existing literature has primarily focused on binary post-treatment variables, leaving the case with continuous post-treatment variables largely unexplored. This gap persists due to the complexity of infinitely many principal strata, which present challenges to both the identification and estimation of causal effects. We fill this gap by providing nonparametric identification and semiparametric estimation theory for principal stratification with continuous post-treatment variables. We propose to use working models to approximate the underlying causal effect surfaces and derive the efficient influence functions of the corresponding model parameters. Based on the theory, we construct doubly robust estimators and implement them in the R package continuousPCE.more » « less
-
Abstract Propensity score weighting is a tool for causal inference to adjust for measured confounders in observational studies. In practice, data often present complex structures, such as clustering, which make propensity score modeling and estimation challenging. In addition, for clustered data, there may be unmeasured cluster-level covariates that are related to both the treatment assignment and outcome. When such unmeasured cluster-specific confounders exist and are omitted in the propensity score model, the subsequent propensity score adjustment may be biased. In this article, we propose a calibration technique for propensity score estimation under the latent ignorable treatment assignment mechanism, i. e., the treatment-outcome relationship is unconfounded given the observed covariates and the latent cluster-specific confounders. We impose novel balance constraints which imply exact balance of the observed confounders and the unobserved cluster-level confounders between the treatment groups. We show that the proposed calibrated propensity score weighting estimator is doubly robust in that it is consistent for the average treatment effect if either the propensity score model is correctly specified or the outcome follows a linear mixed effects model. Moreover, the proposed weighting method can be combined with sampling weights for an integrated solution to handle confounding and sampling designs for causal inference with clustered survey data. In simulation studies, we show that the proposed estimator is superior to other competitors. We estimate the effect of School Body Mass Index Screening on prevalence of overweight and obesity for elementary schools in Pennsylvania.more » « less
An official website of the United States government

