skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on January 20, 2026

Title: Randomization-based Z -estimation for evaluating average and individual treatment effects
Summary Randomized experiments have been the gold standard for drawing causal inference. The conventional model-based approach has been one of the most popular methods of analysing treatment effects from randomized experiments, which is often carried out through inference for certain model parameters. In this paper, we provide a systematic investigation of model-based analyses for treatment effects under the randomization-based inference framework. This framework does not impose any distributional assumptions on the outcomes, covariates and their dependence, and utilizes only randomization as the reasoned basis. We first derive the asymptotic theory for $ Z $-estimation in completely randomized experiments, and propose sandwich-type conservative covariance estimation. We then apply the developed theory to analyse both average and individual treatment effects in randomized experiments. For the average treatment effect, we consider model-based, model-imputed and model-assisted estimation strategies, where the first two strategies can be sensitive to model misspecification or require specific methods for parameter estimation. The model-assisted approach is robust to arbitrary model misspecification and always provides consistent average treatment effect estimation. We propose optimal ways to conduct model-assisted estimation using generally nonlinear least squares for parameter estimation. For the individual treatment effects, we propose directly modelling the relationship between individual effects and covariates, and discuss the model’s identifiability, inference and interpretation allowing for model misspecification.  more » « less
Award ID(s):
2400961
PAR ID:
10628566
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrika
Volume:
112
Issue:
2
ISSN:
1464-3510
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Cluster-randomized experiments are widely used due to their logistical convenience and policy relevance. To analyse them properly, we must address the fact that the treatment is assigned at the cluster level instead of the individual level. Standard analytic strategies are regressions based on individual data, cluster averages and cluster totals, which differ when the cluster sizes vary. These methods are often motivated by models with strong and unverifiable assumptions, and the choice among them can be subjective. Without any outcome modelling assumption, we evaluate these regression estimators and the associated robust standard errors from the design-based perspective where only the treatment assignment itself is random and controlled by the experimenter. We demonstrate that regression based on cluster averages targets a weighted average treatment effect, regression based on individual data is suboptimal in terms of efficiency and regression based on cluster totals is consistent and more efficient with a large number of clusters. We highlight the critical role of covariates in improving estimation efficiency and illustrate the efficiency gain via both simulation studies and data analysis. The asymptotic analysis also reveals the efficiency-robustness trade-off by comparing the properties of various estimators using data at different levels with and without covariate adjustment. Moreover, we show that the robust standard errors are convenient approximations to the true asymptotic standard errors under the design-based perspective. Our theory holds even when the outcome models are misspecified, so it is model-assisted rather than model-based. We also extend the theory to a wider class of weighted average treatment effects. 
    more » « less
  2. Abstract Randomization inference is a powerful tool in early phase vaccine trials when estimating the causal effect of a regimen against a placebo or another regimen. Randomization-based inference often focuses on testing either Fisher’s sharp null hypothesis of no treatment effect for any participant or Neyman’s weak null hypothesis of no sample average treatment effect. Many recent efforts have explored conducting exact randomization-based inference for other summaries of the treatment effect profile, for instance, quantiles of the treatment effect distribution function. In this article, we systematically review methods that conduct exact, randomization-based inference for quantiles of individual treatment effects (ITEs) and extend some results to a special case where naïve participants are expected not to exhibit responses to highly specific endpoints. These methods are suitable for completely randomized trials, stratified completely randomized trials, and a matched study comparing two non-randomized arms from possibly different trials. We evaluate the usefulness of these methods using synthetic data in simulation studies. Finally, we apply these methods to HIV Vaccine Trials Network Study 086 (HVTN 086) and HVTN 205 and showcase a wide range of application scenarios of the methods.Rcode that replicates all analyses in this article can be found in first author’s GitHub page athttps://github.com/Zhe-Chen-1999/ITE-Inference. 
    more » « less
  3. Summary Complete randomization balances covariates on average, but covariate imbalance often exists in finite samples. Rerandomization can ensure covariate balance in the realized experiment by discarding the undesired treatment assignments. Many field experiments in public health and social sciences assign the treatment at the cluster level due to logistical constraints or policy considerations. Moreover, they are frequently combined with re-randomization in the design stage. We define cluster rerandomization as a cluster-randomized experiment compounded with rerandomization to balance covariates at the individual or cluster level. Existing asymptotic theory can only deal with rerandomization with treatments assigned at the individual level, leaving that for cluster rerandomization an open problem. To fill the gap, we provide a design-based theory for cluster rerandomization. Moreover, we compare two cluster rerandomization schemes that use prior information on the importance of the covariates: one based on the weighted Euclidean distance and the other based on the Mahalanobis distance with tiers of covariates. We demonstrate that the former dominates the latter with optimal weights and orthogonalized covariates. Last but not least, we discuss the role of covariate adjustment in the analysis stage, and recommend covariate-adjusted procedures that can be conveniently implemented by least squares with the associated robust standard errors. 
    more » « less
  4. Abstract Evaluating treatment effect heterogeneity widely informs treatment decision making. At the moment, much emphasis is placed on the estimation of the conditional average treatment effect via flexible machine learning algorithms. While these methods enjoy some theoretical appeal in terms of consistency and convergence rates, they generally perform poorly in terms of uncertainty quantification. This is troubling since assessing risk is crucial for reliable decision-making in sensitive and uncertain environments. In this work, we propose a conformal inference-based approach that can produce reliable interval estimates for counterfactuals and individual treatment effects under the potential outcome framework. For completely randomized or stratified randomized experiments with perfect compliance, the intervals have guaranteed average coverage in finite samples regardless of the unknown data generating mechanism. For randomized experiments with ignorable compliance and general observational studies obeying the strong ignorability assumption, the intervals satisfy a doubly robust property which states the following: the average coverage is approximately controlled if either the propensity score or the conditional quantiles of potential outcomes can be estimated accurately. Numerical studies on both synthetic and real data sets empirically demonstrate that existing methods suffer from a significant coverage deficit even in simple models. In contrast, our methods achieve the desired coverage with reasonably short intervals. 
    more » « less
  5. A powerful tool for the analysis of nonrandomized observational studies has been the potential outcomes model. Utilization of this framework allows analysts to estimate average treatment effects. This article considers the situation in which high-dimensional covariates are present and revisits the standard assumptions made in causal inference. We show that by employing a flexible Gaussian process framework, the assumption of strict overlap leads to very restrictive assumptions about the distribution of covariates, results for which can be characterized using classical results from Gaussian random measures as well as reproducing kernel Hilbert space theory. In addition, we propose a strategy for data-adaptive causal effect estimation that does not rely on the strict overlap assumption. These findings reveal under a focused framework the stringency that accompanies the use of the treatment positivity assumption in high-dimensional settings. 
    more » « less