The paper considers estimating a parameter β that defines an estimating function U(y, x, β) for an outcome variable y and its covariate x when the outcome is missing in some of the observations. We assume that, in addition to the outcome and the covariate, a surrogate outcome is available in every observation. The efficiency of existing estimators for β depends critically on correctly specifying the conditional expectation of U given the surrogate and the covariate. When the conditional expectation is not correctly specified, which is the most likely scenario in practice, the efficiency of estimation can be severely compromised even if the propensity function (of missingness) is correctly specified. We propose an estimator that is robust against the choice of the conditional expectation via an empirical likelihood. We demonstrate that the estimator proposed achieves a gain in efficiency whether the conditional score is correctly specified or not. When the conditional score is correctly specified, the estimator reaches the semiparametric variance bound within the class of estimating functions that are generated by U. The practical performance of the estimator is evaluated by using simulation and a data set that is based on the 1996 US presidential election.
This content will become publicly available on September 9, 2025
We propose a communication‐efficient algorithm to estimate the average treatment effect (ATE), when the data are distributed across multiple sites and the number of covariates is possibly much larger than the sample size in each site. Our main idea is to calibrate the estimates of the propensity score and outcome models using some proper surrogate loss functions to approximately attain the desired covariate balancing property. We show that under possible model misspecification, our distributed covariate balancing propensity score estimator (disthdCBPS) can approximate the global estimator, obtained by pooling together the data from multiple sites, at a fast rate. Thus, our estimator remains consistent and asymptotically normal. In addition, when both the propensity score and the outcome models are correctly specified, the proposed estimator attains the semi‐parametric efficiency bound. We illustrate the empirical performance of the proposed method in both simulation and empirical studies.
more » « less- PAR ID:
- 10542357
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Stat
- Volume:
- 13
- Issue:
- 3
- ISSN:
- 2049-1573
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Summary -
Abstract It is common to conduct causal inference in matched observational studies by proceeding as though treatment assignments within matched sets are assigned uniformly at random and using this distribution as the basis for inference. This approach ignores observed discrepancies in matched sets that may be consequential for the distribution of treatment, which are succinctly captured by within-set differences in the propensity score. We address this problem via covariate-adaptive randomization inference, which modifies the permutation probabilities to vary with estimated propensity score discrepancies and avoids requirements to exclude matched pairs or model an outcome variable. We show that the test achieves type I error control arbitrarily close to the nominal level when large samples are available for propensity score estimation. We characterize the large-sample behaviour of the new randomization test for a difference-in-means estimator of a constant additive effect. We also show that existing methods of sensitivity analysis generalize effectively to covariate-adaptive randomization inference. Finally, we evaluate the empirical value of combining matching and covariate-adaptive randomization procedures using simulations and analyses of genetic damage among welders and right-heart catheterization in surgical patients.
-
This paper concerns estimation of subgroup treatment effects with observational data. Existing propensity score methods are mostly developed for estimating overall treatment effect. Although the true propensity scores balance covariates in any subpopulations, the estimated propensity scores may result in severe imbalance in subgroup samples. Indeed, subgroup analysis amplifies a bias-variance tradeoff, whereby increasing complexity of the propensity score model may help to achieve covariate balance within subgroups, but it also increases variance. We propose a new method, the subgroup balancing propensity score, to ensure good subgroup balance as well as to control the variance inflation. For each subgroup, the subgroup balancing propensity score chooses to use either the overall sample or the subgroup (sub)sample to estimate the propensity scores for the units within that subgroup, in order to optimize a criterion accounting for a set of covariate-balancing moment conditions for both the overall sample and the subgroup samples. We develop two versions of subgroup balancing propensity score corresponding to matching and weighting, respectively. We devise a stochastic search algorithm to estimate the subgroup balancing propensity score when the number of subgroups is large. We demonstrate through simulations that the subgroup balancing propensity score improves the performance of propensity score methods in estimating subgroup treatment effects. We apply the subgroup balancing propensity score method to the Italy Survey of Household Income and Wealth (SHIW) to estimate the causal effects of having debit card on household consumption for different income groups.
-
Summary Methodological advancements, including propensity score methods, have resulted in improved unbiased estimation of treatment effects from observational data. Traditionally, a “throw in the kitchen sink” approach has been used to select covariates for inclusion into the propensity score, but recent work shows including unnecessary covariates can impact both the bias and statistical efficiency of propensity score estimators. In particular, the inclusion of covariates that impact exposure but not the outcome, can inflate standard errors without improving bias, while the inclusion of covariates associated with the outcome but unrelated to exposure can improve precision. We propose the outcome-adaptive lasso for selecting appropriate covariates for inclusion in propensity score models to account for confounding bias and maintaining statistical efficiency. This proposed approach can perform variable selection in the presence of a large number of spurious covariates, that is, covariates unrelated to outcome or exposure. We present theoretical and simulation results indicating that the outcome-adaptive lasso selects the propensity score model that includes all true confounders and predictors of outcome, while excluding other covariates. We illustrate covariate selection using the outcome-adaptive lasso, including comparison to alternative approaches, using simulated data and in a survey of patients using opioid therapy to manage chronic pain.
-
A common goal in observational research is to estimate marginal causal effects in the presence of confounding variables. One solution to this problem is to use the covariate distribution to weight the outcomes such that the data appear randomized. The propensity score is a natural quantity that arises in this setting. Propensity score weights have desirable asymptotic properties, but they often fail to adequately balance covariate data in finite samples. Empirical covariate balancing methods pose as an appealing alternative by exactly balancing the sample moments of the covariate distribution. With this objective in mind, we propose a framework for estimating balancing weights by solving a constrained convex program, where the criterion function to be optimized is a Bregman distance. We then show that the different distances in this class render identical weights to those of other covariate balancing methods. A series of numerical studies are presented to demonstrate these similarities.more » « less