Abstract Understanding treatment effect heterogeneity has become an increasingly popular task in various fields, as it helps design personalized advertisements in e-commerce or targeted treatment in biomedical studies. However, most of the existing work in this research area focused on either analysing observational data based on strong causal assumptions or conducting post hoc analyses of randomized controlled trial data, and there has been limited effort dedicated to the design of randomized experiments specifically for uncovering treatment effect heterogeneity. In the manuscript, we develop a framework for designing and analysing response adaptive experiments toward better learning treatment effect heterogeneity. Concretely, we provide response adaptive experimental design frameworks that sequentially revise the data collection mechanism according to the accrued evidence during the experiment. Such design strategies allow for the identification of subgroups with the largest treatment effects with enhanced statistical efficiency. The proposed frameworks not only unify adaptive enrichment designs and response-adaptive randomization designs but also complement A/B test designs in e-commerce and randomized trial designs in clinical settings. We demonstrate the merit of our design with theoretical justifications and in simulation studies with synthetic e-commerce and clinical trial data.
more »
« less
This content will become publicly available on November 20, 2025
Some theoretical foundations for the design and analysis of randomized experiments
Abstract Neyman’s seminal work in 1923 has been a milestone in statistics over the century, which has motivated many fundamental statistical concepts and methodology. In this review, we delve into Neyman’s groundbreaking contribution and offer technical insights into the design and analysis of randomized experiments. We shall review the basic setup of completely randomized experiments and the classical approaches for inferring the average treatment effects. We shall, in particular, review more efficient design and analysis of randomized experiments by utilizing pretreatment covariates, which move beyond Neyman’s original work without involving any covariate. We then summarize several technical ingredients regarding randomizations and permutations that have been developed over the century, such as permutational central limit theorems and Berry–Esseen bounds, and we elaborate on how these technical results facilitate the understanding of randomized experiments. The discussion is also extended to other randomized experiments including rerandomization, stratified randomized experiments, matched pair experiments, and cluster randomized experiments.
more »
« less
- Award ID(s):
- 2400961
- PAR ID:
- 10521504
- Publisher / Repository:
- De Gruyter
- Date Published:
- Journal Name:
- Journal of Causal Inference
- Volume:
- 12
- Issue:
- 1
- ISSN:
- 2193-3685
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In paired experiments, participants are grouped into pairs with similar characteristics, and one observation from each pair is randomly assigned to treatment. The resulting treatment and control groups should be well-balanced; however, there may still be small chance imbalances. Building on work for completely randomized experiments, we propose a design-based method to adjust for covariate imbalances in paired experiments. We leave out each pair and impute its potential outcomes using any prediction algorithm such as lasso or random forests. This method addresses a unique trade-off that exists for paired experiments. By addressing this trade-off, the method has the potential to improve precision over existing methods.more » « less
-
Session types guarantee that message-passing processes adhere to predefined communication protocols. Prior work on session types has focused on deterministic languages but many message-passing systems, such as Markov chains and randomized distributed algorithms, are probabilistic. To implement and analyze such systems, this article develops the meta theory of probabilistic session types with an application focus on automatic expected resource analysis. Probabilistic session types describe probability distributions over messages and are a conservative extension of intuitionistic (binary) session types. To send on a probabilistic channel, processes have to utilize internal randomness from a probabilistic branching or external randomness from receiving on a probabilistic channel. The analysis for expected resource bounds is smoothly integrated with the type system and is a variant of automatic amortized resource analysis. Type inference relies on linear constraint solving to automatically derive symbolic bounds for various cost metrics. The technical contributions include the meta theory that is based on a novel nested multiverse semantics and a type-reconstruction algorithm that allows flexible mixing of different sources of randomness without burdening the programmer with complex type annotations. The type system has been implemented in the language NomosPro with linear-time type checking. Experiments demonstrate that NomosPro is applicable in different domains such as cost analysis of randomized distributed algorithms, analysis of Markov chains, probabilistic analysis of amortized data structures and digital contracts. NomosPro is also shown to be scalable by (i) implementing two broadcast and a bounded retransmission protocol where messages are dropped with a fixed probability, and (ii) verifying the limiting distribution of a Markov chain with 64 states and 420 transitions.more » « less
-
null (Ed.)Education research has experienced a methodological renaissance over the past two decades, with a new focus on large-scale randomized experiments. This wave of experiments has made education research an even more exciting area for statisticians, unearthing many lessons and challenges in experimental design, causal inference, and statistics more broadly. Importantly, educational research and practice almost always occur in a multilevel setting, which makes the statistics relevant to other fields with this structure, including social policy, health services research, and clinical trials in medicine. In this article we first briefly review the history that led to this new era in education research and describe the design features that dominate the modern large-scale educational experiments. We then highlight some of the key statistical challenges in this area, including endogeneity of design, heterogeneity of treatment effects, noncompliance with treatment assignment, mediation, generalizability, and spillover. Though a secondary focus, we also touch on promising trial designs that answer more nuanced questions, such as the SMART design for studying dynamic treatment regimes and factorial designs for optimizing the components of an existing treatment.more » « less
-
We discuss the question of learning distribution over permutations of a given set of choices, options or items based on partial observations. This is central to capturing the so called ``choice'' in a variety of contexts: understanding preferences of consumers over a collection of products based on purchasing and browsing data in the setting of retail and e-commerce, learning public opinion amongst a collection of socio-economic issues based on sparse polling data, deciding a ranking of teams or players based on outcomes of games, electing leaders based on votes, and more generally collaborative decision making based on collective judgement such as accepting paper(s) in a competitive academic conference. The question of learning distribution over permutations arises beyond capturing ``choice'' as well. For example, tracking a collection of objects using noisy cameras, or aggregating ranking of web-pages using outcomes of multiple search engines. It is only natural that such a topic has been extensively studied in Economics, Political Science and Psychology for more than a century, and more so recently in Computer Science, Electrical Engineering, Statistics and Operations Research. Here we shall focus on the task of learning distribution over permutations from its marginal distributions of two types: first-order marginals and pair-wise comparisons. There has been a lot of progress made on this topic in the last decade. The ideal goal is to provide a comprehensive overview of the state-of-art on this topic. We shall provide detailed overview of selective aspects, biased by author's perspective of the topic. And provide sufficient pointers to aspects not covered here. We shall emphasize on ability to identify the entire distribution over permutation as well as ``best ranking''.more » « less
An official website of the United States government
