Building on Yu and Kumbier's predictability, computability and stability (PCS) framework and for randomised experiments, we introduce a novel methodology for Stable Discovery of Interpretable Subgroups via Calibration (StaDISC), with large heterogeneous treatment effects. StaDISC was developed during our re‐analysis of the 1999–2000 VIGOR study, an 8076‐patient randomised controlled trial that compared the risk of adverse events from a then newly approved drug, rofecoxib (Vioxx), with that from an older drug naproxen. Vioxx was found to, on average and in comparison with naproxen, reduce the risk of gastrointestinal events but increase the risk of thrombotic cardiovascular events. Applying StaDISC, we fit 18 popular conditional average treatment effect (CATE) estimators for both outcomes and use calibration to demonstrate their poor global performance. However, they are locally well‐calibrated and stable, enabling the identification of patient groups with larger than (estimated) average treatment effects. In fact, StaDISC discovers three clinically interpretable subgroups each for the gastrointestinal outcome (totalling 29.4% of the study size) and the thrombotic cardiovascular outcome (totalling 11.0%). Complementary analyses of the found subgroups using the 2001–2004 APPROVe study, a separate independently conducted randomised controlled trial with 2587 patients, provide further supporting evidence for the promise of StaDISC.
In heterogeneous treatment effect models with endogeneity, identification of the local average treatment effect (LATE) typically relies on the availability of an exogenous instrument monotonically related to treatment participation. First, we demonstrate that a strictly weaker local monotonicity condition—invoked for specific potential outcome values rather than globally—identifies the LATEs on compliers and defiers. Second, we show that our identification results apply to subsets of compliers and defiers when imposing an even weaker local compliers-defiers assumption that allows for both types at any potential outcome value. We propose estimators that are potentially more efficient than two-stage least squares (2SLS) in finite samples, even in cases where 2SLS is consistent. Finally, we provide an empirical application to estimating returns to education using the quarter of birth instrument.more » « less
- NSF-PAR ID:
- Publisher / Repository:
- Oxford University Press
- Date Published:
- Journal Name:
- The Econometrics Journal
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
Evaluating treatment effect heterogeneity widely informs treatment decision making. At the moment, much emphasis is placed on the estimation of the conditional average treatment effect via flexible machine learning algorithms. While these methods enjoy some theoretical appeal in terms of consistency and convergence rates, they generally perform poorly in terms of uncertainty quantification. This is troubling since assessing risk is crucial for reliable decision-making in sensitive and uncertain environments. In this work, we propose a conformal inference-based approach that can produce reliable interval estimates for counterfactuals and individual treatment effects under the potential outcome framework. For completely randomized or stratified randomized experiments with perfect compliance, the intervals have guaranteed average coverage in finite samples regardless of the unknown data generating mechanism. For randomized experiments with ignorable compliance and general observational studies obeying the strong ignorability assumption, the intervals satisfy a doubly robust property which states the following: the average coverage is approximately controlled if either the propensity score or the conditional quantiles of potential outcomes can be estimated accurately. Numerical studies on both synthetic and real data sets empirically demonstrate that existing methods suffer from a significant coverage deficit even in simple models. In contrast, our methods achieve the desired coverage with reasonably short intervals.
Cluster-randomized experiments are widely used due to their logistical convenience and policy relevance. To analyse them properly, we must address the fact that the treatment is assigned at the cluster level instead of the individual level. Standard analytic strategies are regressions based on individual data, cluster averages and cluster totals, which differ when the cluster sizes vary. These methods are often motivated by models with strong and unverifiable assumptions, and the choice among them can be subjective. Without any outcome modelling assumption, we evaluate these regression estimators and the associated robust standard errors from the design-based perspective where only the treatment assignment itself is random and controlled by the experimenter. We demonstrate that regression based on cluster averages targets a weighted average treatment effect, regression based on individual data is suboptimal in terms of efficiency and regression based on cluster totals is consistent and more efficient with a large number of clusters. We highlight the critical role of covariates in improving estimation efficiency and illustrate the efficiency gain via both simulation studies and data analysis. The asymptotic analysis also reveals the efficiency-robustness trade-off by comparing the properties of various estimators using data at different levels with and without covariate adjustment. Moreover, we show that the robust standard errors are convenient approximations to the true asymptotic standard errors under the design-based perspective. Our theory holds even when the outcome models are misspecified, so it is model-assisted rather than model-based. We also extend the theory to a wider class of weighted average treatment effects.
Studies of social networks provide unique opportunities to assess the causal effects of interventions that may impact more of the population than just those intervened on directly. Such effects are sometimes called peer or spillover effects, and may exist in the presence of interference, that is, when one individual's treatment affects another individual's outcome. Randomization‐based inference (RI) methods provide a theoretical basis for causal inference in randomized studies, even in the presence of interference. In this article, we consider RI of the intervention effect in the eX‐FLU trial, a randomized study designed to assess the effect of a social distancing intervention on influenza‐like‐illness transmission in a connected network of college students. The approach considered enables inference about the effect of the social distancing intervention on the per‐contact probability of influenza‐like‐illness transmission in the observed network. The methods allow for interference between connected individuals and for heterogeneous treatment effects. The proposed methods are evaluated empirically via simulation studies, and then applied to data from the eX‐FLU trial.
Serum Hepatitis B core‐related antigen (HBcrAg) level moderately correlates with cccDNA. We examined whether HBcrAg can add value in monitoring the effect of peginterferon (PEG‐IFN) therapy for HBeAg‐negative chronic hepatitis B (CHB) infection. Thus, serum HBcrAg level was measured in 133 HBeAg‐negative, mainly Caucasian CHB patients, treated with 48 weeks of PEG‐IFN alfa‐2a. We assessed its association with response (ALT normalization & HBV DNA < 2000 IU/mL) at week 72. HBcrAg level strongly correlated with HBV DNA level (
r= 0.8, P< 0.001) and weakly with qHBsAg and ALT (both r= 0.2, P= 0.01). At week 48, mean HBcrAg decline was −3.3 log U/mL. Baseline levels were comparable for patients with and without response at week 72 (5.0 vs 4.9 log U/mL, P= 0.59). HBcrAg decline at week 72 differed between patients with and without response (−2.4 vs −1.0 log U/mL, P= 0.001), but no cut‐off could be determined. The pattern of decline in responders resembled that of HBV DNA, but HBcrAg decline was weaker (HBcrAg −2.5 log U/mL; HBV DNA: −4.0 log IU/mL, P< 0.001). For early identification of nonresponse, diagnostic accuracy of HBV DNA and qHBsAg decline at week 12 (AUC 0.742, CI‐95% [0.0.629‐0.855], P< 0.001) did not improve by adding HBcrAg decline (AUC 0.747, CI‐95% [0.629‐0.855] P< 0.001), nor by replacing HBV DNA decline by HBcrAg decline (AUC 0.754, CI‐95% [0.641‐0.867], P< 0.001). In conclusion, in Caucasian patients with HBeAg‐negative CHB, decline of HBcrAg during PEG‐IFN treatment was stronger in patients with treatment response. However, HBcrAg was not superior to HBV DNA and qHBsAg in predicting response during PEG‐IFN treatment.