skip to main content


Title: Covariate adjustment in multiarmed, possibly factorial experiments
Abstract

Randomized experiments are the gold standard for causal inference and enable unbiased estimation of treatment effects. Regression adjustment provides a convenient way to incorporate covariate information for additional efficiency. This article provides a unified account of its utility for improving estimation efficiency in multiarmed experiments. We start with the commonly used additive and fully interacted models for regression adjustment in estimating average treatment effects (ATE), and clarify the trade-offs between the resulting ordinary least squares (OLS) estimators in terms of finite sample performance and asymptotic efficiency. We then move on to regression adjustment based on restricted least squares (RLS), and establish for the first time its properties for inferring ATE from the design-based perspective. The resulting inference has multiple guarantees. First, it is asymptotically efficient when the restriction is correctly specified. Second, it remains consistent as long as the restriction on the coefficients of the treatment indicators, if any, is correctly specified and separate from that on the coefficients of the treatment-covariate interactions. Third, it can have better finite sample performance than the unrestricted counterpart even when the restriction is moderately misspecified. It is thus our recommendation when the OLS fit of the fully interacted regression risks large finite sample variability in case of many covariates, many treatments, yet a moderate sample size. In addition, the newly established theory of RLS also provides a unified way of studying OLS-based inference from general regression specifications. As an illustration, we demonstrate its value for studying OLS-based regression adjustment in factorial experiments. Importantly, although we analyse inferential procedures that are motivated by OLS, we do not invoke any assumptions required by the underlying linear models.

 
more » « less
Award ID(s):
1945136
NSF-PAR ID:
10393773
Author(s) / Creator(s):
;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Volume:
85
Issue:
1
ISSN:
1369-7412
Page Range / eLocation ID:
p. 1-23
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper presents finite‐sample efficiency bounds for the core econometric problem of estimation of linear regression coefficients. We show that the classical Gauss–Markov theorem can be restated omitting the unnatural restriction to linear estimators, without adding any extra conditions. Our results are lower bounds on the variances of unbiased estimators. These lower bounds correspond to the variances of the the least squares estimator and the generalized least squares estimator, depending on the assumption on the error covariances. These results show that we can drop the label “linear estimator” from the pedagogy of the Gauss–Markov theorem. Instead of referring to these estimators as BLUE, they can legitimately be called BUE (best unbiased estimators). 
    more » « less
  2. Summary

    This article investigates a generalized semiparametric varying-coefficient model for longitudinal data that can flexibly model three types of covariate effects: time-constant effects, time-varying effects, and covariate-varying effects. Different link functions can be selected to provide a rich family of models for longitudinal data. The model assumes that the time-varying effects are unspecified functions of time and the covariate-varying effects are parametric functions of an exposure variable specified up to a finite number of unknown parameters. The estimation procedure is developed using local linear smoothing and profile weighted least squares estimation techniques. Hypothesis testing procedures are developed to test the parametric functions of the covariate-varying effects. The asymptotic distributions of the proposed estimators are established. A working formula for bandwidth selection is discussed and examined through simulations. Our simulation study shows that the proposed methods have satisfactory finite sample performance. The proposed methods are applied to the ACTG 244 clinical trial of HIV infected patients being treated with Zidovudine to examine the effects of antiretroviral treatment switching before and after HIV develops the T215Y/F drug resistance mutation. Our analysis shows benefits of treatment switching to the combination therapies as compared to continuing with ZDV monotherapy before and after developing the 215-mutation.

     
    more » « less
  3. Summary

    Comparative effectiveness research often involves evaluating the differences in the risks of an event of interest between two or more treatments using observational data. Often, the post‐treatment outcome of interest is whether the event happens within a pre‐specified time window, which leads to a binary outcome. One source of bias for estimating the causal treatment effect is the presence of confounders, which are usually controlled using propensity score‐based methods. An additional source of bias is right‐censoring, which occurs when the information on the outcome of interest is not completely available due to dropout, study termination, or treatment switch before the event of interest. We propose an inverse probability weighted regression‐based estimator that can simultaneously handle both confounding and right‐censoring, calling the method CIPWR, with the letter C highlighting the censoring component. CIPWR estimates the average treatment effects by averaging the predicted outcomes obtained from a logistic regression model that is fitted using a weighted score function. The CIPWR estimator has a double robustness property such that estimation consistency can be achieved when either the model for the outcome or the models for both treatment and censoring are correctly specified. We establish the asymptotic properties of the CIPWR estimator for conducting inference, and compare its finite sample performance with that of several alternatives through simulation studies. The methods under comparison are applied to a cohort of prostate cancer patients from an insurance claims database for comparing the adverse effects of four candidate drugs for advanced stage prostate cancer.

     
    more » « less
  4. Summary

    Complete randomization balances covariates on average, but covariate imbalance often exists in finite samples. Rerandomization can ensure covariate balance in the realized experiment by discarding the undesired treatment assignments. Many field experiments in public health and social sciences assign the treatment at the cluster level due to logistical constraints or policy considerations. Moreover, they are frequently combined with re-randomization in the design stage. We define cluster rerandomization as a cluster-randomized experiment compounded with rerandomization to balance covariates at the individual or cluster level. Existing asymptotic theory can only deal with rerandomization with treatments assigned at the individual level, leaving that for cluster rerandomization an open problem. To fill the gap, we provide a design-based theory for cluster rerandomization. Moreover, we compare two cluster rerandomization schemes that use prior information on the importance of the covariates: one based on the weighted Euclidean distance and the other based on the Mahalanobis distance with tiers of covariates. We demonstrate that the former dominates the latter with optimal weights and orthogonalized covariates. Last but not least, we discuss the role of covariate adjustment in the analysis stage, and recommend covariate-adjusted procedures that can be conveniently implemented by least squares with the associated robust standard errors.

     
    more » « less
  5. In different fields of applications including, but not limited to, behavioral, environmental, medical sciences, and econometrics, the use of panel data regression models has become increasingly popular as a general framework for making meaningful statistical inferences. However, when the ordinary least squares (OLS) method is used to estimate the model parameters, presence of outliers may significantly alter the adequacy of such models by producing biased and inefficient estimates. In this work, we propose a new, weighted likelihood based robust estimation procedure for linear panel data models with fixed and random effects. The finite sample performances of the proposed estimators have been illustrated through an extensive simulation study as well as with an application to blood pressure dataset. Our thorough study demonstrates that the proposed estimators show significantly better performances over the traditional methods in the presence of outliers and produce competitive results to the OLS based estimates when no outliers are present in the dataset.

     
    more » « less