skip to main content


Title: Model-Assisted Analyses of Cluster-Randomized Experiments
Abstract

Cluster-randomized experiments are widely used due to their logistical convenience and policy relevance. To analyse them properly, we must address the fact that the treatment is assigned at the cluster level instead of the individual level. Standard analytic strategies are regressions based on individual data, cluster averages and cluster totals, which differ when the cluster sizes vary. These methods are often motivated by models with strong and unverifiable assumptions, and the choice among them can be subjective. Without any outcome modelling assumption, we evaluate these regression estimators and the associated robust standard errors from the design-based perspective where only the treatment assignment itself is random and controlled by the experimenter. We demonstrate that regression based on cluster averages targets a weighted average treatment effect, regression based on individual data is suboptimal in terms of efficiency and regression based on cluster totals is consistent and more efficient with a large number of clusters. We highlight the critical role of covariates in improving estimation efficiency and illustrate the efficiency gain via both simulation studies and data analysis. The asymptotic analysis also reveals the efficiency-robustness trade-off by comparing the properties of various estimators using data at different levels with and without covariate adjustment. Moreover, we show that the robust standard errors are convenient approximations to the true asymptotic standard errors under the design-based perspective. Our theory holds even when the outcome models are misspecified, so it is model-assisted rather than model-based. We also extend the theory to a wider class of weighted average treatment effects.

 
more » « less
Award ID(s):
1945136
NSF-PAR ID:
10398637
Author(s) / Creator(s):
;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Volume:
83
Issue:
5
ISSN:
1369-7412
Format(s):
Medium: X Size: p. 994-1015
Size(s):
["p. 994-1015"]
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Complete randomization balances covariates on average, but covariate imbalance often exists in finite samples. Rerandomization can ensure covariate balance in the realized experiment by discarding the undesired treatment assignments. Many field experiments in public health and social sciences assign the treatment at the cluster level due to logistical constraints or policy considerations. Moreover, they are frequently combined with re-randomization in the design stage. We define cluster rerandomization as a cluster-randomized experiment compounded with rerandomization to balance covariates at the individual or cluster level. Existing asymptotic theory can only deal with rerandomization with treatments assigned at the individual level, leaving that for cluster rerandomization an open problem. To fill the gap, we provide a design-based theory for cluster rerandomization. Moreover, we compare two cluster rerandomization schemes that use prior information on the importance of the covariates: one based on the weighted Euclidean distance and the other based on the Mahalanobis distance with tiers of covariates. We demonstrate that the former dominates the latter with optimal weights and orthogonalized covariates. Last but not least, we discuss the role of covariate adjustment in the analysis stage, and recommend covariate-adjusted procedures that can be conveniently implemented by least squares with the associated robust standard errors.

     
    more » « less
  2. Abstract

    Randomized experiments are the gold standard for causal inference and enable unbiased estimation of treatment effects. Regression adjustment provides a convenient way to incorporate covariate information for additional efficiency. This article provides a unified account of its utility for improving estimation efficiency in multiarmed experiments. We start with the commonly used additive and fully interacted models for regression adjustment in estimating average treatment effects (ATE), and clarify the trade-offs between the resulting ordinary least squares (OLS) estimators in terms of finite sample performance and asymptotic efficiency. We then move on to regression adjustment based on restricted least squares (RLS), and establish for the first time its properties for inferring ATE from the design-based perspective. The resulting inference has multiple guarantees. First, it is asymptotically efficient when the restriction is correctly specified. Second, it remains consistent as long as the restriction on the coefficients of the treatment indicators, if any, is correctly specified and separate from that on the coefficients of the treatment-covariate interactions. Third, it can have better finite sample performance than the unrestricted counterpart even when the restriction is moderately misspecified. It is thus our recommendation when the OLS fit of the fully interacted regression risks large finite sample variability in case of many covariates, many treatments, yet a moderate sample size. In addition, the newly established theory of RLS also provides a unified way of studying OLS-based inference from general regression specifications. As an illustration, we demonstrate its value for studying OLS-based regression adjustment in factorial experiments. Importantly, although we analyse inferential procedures that are motivated by OLS, we do not invoke any assumptions required by the underlying linear models.

     
    more » « less
  3. Many causal and structural effects depend on regressions. Examples include policy effects, average derivatives, regression decompositions, average treatment effects, causal mediation, and parameters of economic structural models. The regressions may be high‐dimensional, making machine learning useful. Plugging machine learners into identifying equations can lead to poor inference due to bias from regularization and/or model selection. This paper gives automatic debiasing for linear and nonlinear functions of regressions. The debiasing is automatic in using Lasso and the function of interest without the full form of the bias correction. The debiasing can be applied to any regression learner, including neural nets, random forests, Lasso, boosting, and other high‐dimensional methods. In addition to providing the bias correction, we give standard errors that are robust to misspecification, convergence rates for the bias correction, and primitive conditions for asymptotic inference for estimators of a variety of estimators of structural and causal effects. The automatic debiased machine learning is used to estimate the average treatment effect on the treated for the NSW job training data and to estimate demand elasticities from Nielsen scanner data while allowing preferences to be correlated with prices and income. 
    more » « less
  4. Summary

    The problem of estimating the average treatment effects is important when evaluating the effectiveness of medical treatments or social intervention policies. Most of the existing methods for estimating the average treatment effect rely on some parametric assumptions about the propensity score model or the outcome regression model one way or the other. In reality, both models are prone to misspecification, which can have undue influence on the estimated average treatment effect. We propose an alternative robust approach to estimating the average treatment effect based on observational data in the challenging situation when neither a plausible parametric outcome model nor a reliable parametric propensity score model is available. Our estimator can be considered as a robust extension of the popular class of propensity score weighted estimators. This approach has the advantage of being robust, flexible, data adaptive, and it can handle many covariates simultaneously. Adopting a dimension reduction approach, we estimate the propensity score weights semiparametrically by using a non-parametric link function to relate the treatment assignment indicator to a low-dimensional structure of the covariates which are formed typically by several linear combinations of the covariates. We develop a class of consistent estimators for the average treatment effect and study their theoretical properties. We demonstrate the robust performance of the estimators on simulated data and a real data example of investigating the effect of maternal smoking on babies’ birth weight.

     
    more » « less
  5. Abstract

    Complementary features of randomized controlled trials (RCTs) and observational studies (OSs) can be used jointly to estimate the average treatment effect of a target population. We propose a calibration weighting estimator that enforces the covariate balance between the RCT and OS, therefore improving the trial-based estimator's generalizability. Exploiting semiparametric efficiency theory, we propose a doubly robust augmented calibration weighting estimator that achieves the efficiency bound derived under the identification assumptions. A nonparametric sieve method is provided as an alternative to the parametric approach, which enables the robust approximation of the nuisance functions and data-adaptive selection of outcome predictors for calibration. We establish asymptotic results and confirm the finite sample performances of the proposed estimators by simulation experiments and an application on the estimation of the treatment effect of adjuvant chemotherapy for early-stage non-small-cell lung patients after surgery.

     
    more » « less