Abstract Complementary features of randomized controlled trials (RCTs) and observational studies (OSs) can be used jointly to estimate the average treatment effect of a target population. We propose a calibration weighting estimator that enforces the covariate balance between the RCT and OS, therefore improving the trial-based estimator's generalizability. Exploiting semiparametric efficiency theory, we propose a doubly robust augmented calibration weighting estimator that achieves the efficiency bound derived under the identification assumptions. A nonparametric sieve method is provided as an alternative to the parametric approach, which enables the robust approximation of the nuisance functions and data-adaptive selection of outcome predictors for calibration. We establish asymptotic results and confirm the finite sample performances of the proposed estimators by simulation experiments and an application on the estimation of the treatment effect of adjuvant chemotherapy for early-stage non-small-cell lung patients after surgery.
more »
« less
This content will become publicly available on April 24, 2026
Augmented balancing weights as linear regression
Abstract We provide a novel characterization of augmented balancing weights, also known as automatic debiased machine learning. These popular doubly robust estimators combine outcome modelling with balancing weights—weights that achieve covariate balance directly instead of estimating and inverting the propensity score. When the outcome and weighting models are both linear in some (possibly infinite) basis, we show that the augmented estimator is equivalent to a single linear model with coefficients that combine those of the original outcome model with those from unpenalized ordinary least-squares (OLS). Under certain choices of regularization parameters, the augmented estimator in fact collapses to the OLS estimator alone. We then extend these results to specific outcome and weighting models. We first show that the augmented estimator that uses (kernel) ridge regression for both outcome and weighting models is equivalent to a single, undersmoothed (kernel) ridge regression—implying a novel analysis of undersmoothing. When the weighting model is instead lasso-penalized, we demonstrate a familiar ‘double selection’ property. Our framework opens the black box on this increasingly popular class of estimators, bridges the gap between existing results on the semiparametric efficiency of undersmoothed and doubly robust estimators, and provides new insights into the performance of augmented balancing weights.
more »
« less
- Award ID(s):
- 2243822
- PAR ID:
- 10646736
- Publisher / Repository:
- Royal Statistical Society
- Date Published:
- Journal Name:
- Journal of the Royal Statistical Society Series B: Statistical Methodology
- ISSN:
- 1369-7412
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In observational studies, balancing covariates in different treatment groups is essential to estimate treatment effects. One of the most commonly used methods for such purposes is weighting. The performance of this class of methods usually depends on strong regularity conditions for the underlying model, which might not hold in practice. In this paper, we investigate weighting methods from a functional estimation perspective and argue that the weights needed for covariate balancing could differ from those needed for treatment effects estimation under low regularity conditions. Motivated by this observation, we introduce a new framework of weighting that directly targets the treatment effects estimation. Unlike existing methods, the resulting estimator for a treatment effect under this new framework is a simple kernel-based U-statistic after applying a data-driven transformation to the observed covariates. We characterize the theoretical properties of the new estimators of treatment effects under a nonparametric setting and show that they are able to work robustly under low regularity conditions. The new framework is also applied to several numerical examples to demonstrate its practical merits.more » « less
-
Summary The problem of estimating the average treatment effects is important when evaluating the effectiveness of medical treatments or social intervention policies. Most of the existing methods for estimating the average treatment effect rely on some parametric assumptions about the propensity score model or the outcome regression model one way or the other. In reality, both models are prone to misspecification, which can have undue influence on the estimated average treatment effect. We propose an alternative robust approach to estimating the average treatment effect based on observational data in the challenging situation when neither a plausible parametric outcome model nor a reliable parametric propensity score model is available. Our estimator can be considered as a robust extension of the popular class of propensity score weighted estimators. This approach has the advantage of being robust, flexible, data adaptive, and it can handle many covariates simultaneously. Adopting a dimension reduction approach, we estimate the propensity score weights semiparametrically by using a non-parametric link function to relate the treatment assignment indicator to a low-dimensional structure of the covariates which are formed typically by several linear combinations of the covariates. We develop a class of consistent estimators for the average treatment effect and study their theoretical properties. We demonstrate the robust performance of the estimators on simulated data and a real data example of investigating the effect of maternal smoking on babies’ birth weight.more » « less
-
Abstract Propensity score weighting is a tool for causal inference to adjust for measured confounders in observational studies. In practice, data often present complex structures, such as clustering, which make propensity score modeling and estimation challenging. In addition, for clustered data, there may be unmeasured cluster-level covariates that are related to both the treatment assignment and outcome. When such unmeasured cluster-specific confounders exist and are omitted in the propensity score model, the subsequent propensity score adjustment may be biased. In this article, we propose a calibration technique for propensity score estimation under the latent ignorable treatment assignment mechanism, i. e., the treatment-outcome relationship is unconfounded given the observed covariates and the latent cluster-specific confounders. We impose novel balance constraints which imply exact balance of the observed confounders and the unobserved cluster-level confounders between the treatment groups. We show that the proposed calibrated propensity score weighting estimator is doubly robust in that it is consistent for the average treatment effect if either the propensity score model is correctly specified or the outcome follows a linear mixed effects model. Moreover, the proposed weighting method can be combined with sampling weights for an integrated solution to handle confounding and sampling designs for causal inference with clustered survey data. In simulation studies, we show that the proposed estimator is superior to other competitors. We estimate the effect of School Body Mass Index Screening on prevalence of overweight and obesity for elementary schools in Pennsylvania.more » « less
-
In different fields of applications including, but not limited to, behavioral, environmental, medical sciences, and econometrics, the use of panel data regression models has become increasingly popular as a general framework for making meaningful statistical inferences. However, when the ordinary least squares (OLS) method is used to estimate the model parameters, presence of outliers may significantly alter the adequacy of such models by producing biased and inefficient estimates. In this work, we propose a new, weighted likelihood based robust estimation procedure for linear panel data models with fixed and random effects. The finite sample performances of the proposed estimators have been illustrated through an extensive simulation study as well as with an application to blood pressure dataset. Our thorough study demonstrates that the proposed estimators show significantly better performances over the traditional methods in the presence of outliers and produce competitive results to the OLS based estimates when no outliers are present in the dataset.more » « less
An official website of the United States government
