skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Adaptive experiments toward learning treatment effect heterogeneity
Abstract Understanding treatment effect heterogeneity has become an increasingly popular task in various fields, as it helps design personalized advertisements in e-commerce or targeted treatment in biomedical studies. However, most of the existing work in this research area focused on either analysing observational data based on strong causal assumptions or conducting post hoc analyses of randomized controlled trial data, and there has been limited effort dedicated to the design of randomized experiments specifically for uncovering treatment effect heterogeneity. In the manuscript, we develop a framework for designing and analysing response adaptive experiments toward better learning treatment effect heterogeneity. Concretely, we provide response adaptive experimental design frameworks that sequentially revise the data collection mechanism according to the accrued evidence during the experiment. Such design strategies allow for the identification of subgroups with the largest treatment effects with enhanced statistical efficiency. The proposed frameworks not only unify adaptive enrichment designs and response-adaptive randomization designs but also complement A/B test designs in e-commerce and randomized trial designs in clinical settings. We demonstrate the merit of our design with theoretical justifications and in simulation studies with synthetic e-commerce and clinical trial data.  more » « less
Award ID(s):
2220537 2239047
PAR ID:
10573424
Author(s) / Creator(s):
; ;
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Journal of the Royal Statistical Society Series B: Statistical Methodology
Volume:
87
Issue:
4
ISSN:
1369-7412
Format(s):
Medium: X Size: p. 1055-1084
Size(s):
p. 1055-1084
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Education research has experienced a methodological renaissance over the past two decades, with a new focus on large-scale randomized experiments. This wave of experiments has made education research an even more exciting area for statisticians, unearthing many lessons and challenges in experimental design, causal inference, and statistics more broadly. Importantly, educational research and practice almost always occur in a multilevel setting, which makes the statistics relevant to other fields with this structure, including social policy, health services research, and clinical trials in medicine. In this article we first briefly review the history that led to this new era in education research and describe the design features that dominate the modern large-scale educational experiments. We then highlight some of the key statistical challenges in this area, including endogeneity of design, heterogeneity of treatment effects, noncompliance with treatment assignment, mediation, generalizability, and spillover. Though a secondary focus, we also touch on promising trial designs that answer more nuanced questions, such as the SMART design for studying dynamic treatment regimes and factorial designs for optimizing the components of an existing treatment. 
    more » « less
  2. Summary Power analyses are an important aspect of experimental design, because they help determine how experiments are implemented in practice. It is common to specify a desired level of power and compute the sample size necessary to obtain that power. Such calculations are well known for completely randomized experiments, but there can be many benefits to using other experimental designs. For example, it has recently been established that rerandomization, where subjects are randomized until covariate balance is obtained, increases the precision of causal effect estimators. This work establishes the power of rerandomized treatment-control experiments, thereby allowing for sample size calculators. We find the surprising result that, while power is often greater under rerandomization than complete randomization, the opposite can occur for very small treatment effects. The reason is that inference under rerandomization can be relatively more conservative, in the sense that it can have a lower Type-I error at the same nominal significance level, and this additional conservativeness adversely affects power. This surprising result is due to treatment effect heterogeneity, a quantity often ignored in power analyses. We find that heterogeneity increases power for large effect sizes, but decreases power for small effect sizes. 
    more » « less
  3. Kretzschmar, Mirjam E. (Ed.)
    Background Development of an effective antiviral drug for Coronavirus Disease 2019 (COVID-19) is a global health priority. Although several candidate drugs have been identified through in vitro and in vivo models, consistent and compelling evidence from clinical studies is limited. The lack of evidence from clinical trials may stem in part from the imperfect design of the trials. We investigated how clinical trials for antivirals need to be designed, especially focusing on the sample size in randomized controlled trials. Methods and findings A modeling study was conducted to help understand the reasons behind inconsistent clinical trial findings and to design better clinical trials. We first analyzed longitudinal viral load data for Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) without antiviral treatment by use of a within-host virus dynamics model. The fitted viral load was categorized into 3 different groups by a clustering approach. Comparison of the estimated parameters showed that the 3 distinct groups were characterized by different virus decay rates ( p -value < 0.001). The mean decay rates were 1.17 d −1 (95% CI: 1.06 to 1.27 d −1 ), 0.777 d −1 (0.716 to 0.838 d −1 ), and 0.450 d −1 (0.378 to 0.522 d −1 ) for the 3 groups, respectively. Such heterogeneity in virus dynamics could be a confounding variable if it is associated with treatment allocation in compassionate use programs (i.e., observational studies). Subsequently, we mimicked randomized controlled trials of antivirals by simulation. An antiviral effect causing a 95% to 99% reduction in viral replication was added to the model. To be realistic, we assumed that randomization and treatment are initiated with some time lag after symptom onset. Using the duration of virus shedding as an outcome, the sample size to detect a statistically significant mean difference between the treatment and placebo groups (1:1 allocation) was 13,603 and 11,670 (when the antiviral effect was 95% and 99%, respectively) per group if all patients are enrolled regardless of timing of randomization. The sample size was reduced to 584 and 458 (when the antiviral effect was 95% and 99%, respectively) if only patients who are treated within 1 day of symptom onset are enrolled. We confirmed the sample size was similarly reduced when using cumulative viral load in log scale as an outcome. We used a conventional virus dynamics model, which may not fully reflect the detailed mechanisms of viral dynamics of SARS-CoV-2. The model needs to be calibrated in terms of both parameter settings and model structure, which would yield more reliable sample size calculation. Conclusions In this study, we found that estimated association in observational studies can be biased due to large heterogeneity in viral dynamics among infected individuals, and statistically significant effect in randomized controlled trials may be difficult to be detected due to small sample size. The sample size can be dramatically reduced by recruiting patients immediately after developing symptoms. We believe this is the first study investigated the study design of clinical trials for antiviral treatment using the viral dynamics model. 
    more » « less
  4. ImportanceNeurodevelopmental outcomes for children with congenital heart defects (CHD) have improved minimally over the past 20 years. ObjectivesTo assess the feasibility and tolerability of maternal progesterone therapy as well as the magnitude of the effect on neurodevelopment for fetuses with CHD. Design, Setting, and ParticipantsThis double-blinded individually randomized parallel-group clinical trial of vaginal natural progesterone therapy vs placebo in participants carrying fetuses with CHD was conducted between July 2014 and November 2021 at a quaternary care children’s hospital. Participants included maternal-fetal dyads where the fetus had CHD identified before 28 weeks’ gestational age and was likely to need surgery with cardiopulmonary bypass in the neonatal period. Exclusion criteria included a major genetic or extracardiac anomaly other than 22q11 deletion syndrome and known contraindication to progesterone. Statistical analysis was performed June 2022 to April 2024. InterventionParticipants were 1:1 block-randomized to vaginal progesterone or placebo by diagnosis: hypoplastic left heart syndrome (HLHS), transposition of the great arteries (TGA), and other CHD diagnoses. Treatment was administered twice daily between 28 and up to 39 weeks’ gestational age. Main Outcomes and MeasuresThe primary outcome was the motor score of the Bayley Scales of Infant and Toddler Development-III; secondary outcomes included language and cognitive scales. Exploratory prespecified subgroups included cardiac diagnosis, fetal sex, genetic profile, and maternal fetal environment. ResultsThe 102 enrolled fetuses primarily had HLHS (n = 52 [50.9%]) and TGA (n = 38 [37.3%]), were more frequently male (n = 67 [65.7%]), and without genetic anomalies (n = 61 [59.8%]). The mean motor score differed by 2.5 units (90% CI, −1.9 to 6.9 units;P = .34) for progesterone compared with placebo, a value not statistically different from 0. Exploratory subgroup analyses suggested treatment heterogeneity for the motor score for cardiac diagnosis (Pfor interaction = .03) and fetal sex (Pfor interaction = .04), but not genetic profile (Pfor interaction = .16) or maternal-fetal environment (Pfor interaction = .70). Conclusions and RelevanceIn this randomized clinical trial of maternal progesterone therapy, the overall effect was not statistically different from 0. Subgroup analyses suggest heterogeneity of the response to progesterone among CHD diagnosis and fetal sex. Trial RegistrationClinicalTrials.gov Identifier:NCT02133573 
    more » « less
  5. Abstract Evaluating treatment effect heterogeneity widely informs treatment decision making. At the moment, much emphasis is placed on the estimation of the conditional average treatment effect via flexible machine learning algorithms. While these methods enjoy some theoretical appeal in terms of consistency and convergence rates, they generally perform poorly in terms of uncertainty quantification. This is troubling since assessing risk is crucial for reliable decision-making in sensitive and uncertain environments. In this work, we propose a conformal inference-based approach that can produce reliable interval estimates for counterfactuals and individual treatment effects under the potential outcome framework. For completely randomized or stratified randomized experiments with perfect compliance, the intervals have guaranteed average coverage in finite samples regardless of the unknown data generating mechanism. For randomized experiments with ignorable compliance and general observational studies obeying the strong ignorability assumption, the intervals satisfy a doubly robust property which states the following: the average coverage is approximately controlled if either the propensity score or the conditional quantiles of potential outcomes can be estimated accurately. Numerical studies on both synthetic and real data sets empirically demonstrate that existing methods suffer from a significant coverage deficit even in simple models. In contrast, our methods achieve the desired coverage with reasonably short intervals. 
    more » « less