Gentle, James; Scott, David
(Ed.)
Recent years have seen an explosion in methodological work on combining causal effects estimated from observational and experimental datasets. Observational data have the advantage of being inexpensive and increasingly available from sources such as electronic health records, insurance claims databases, and online learning platforms. These data are representative of target populations, but because treatment assignments are not randomized, they suffer from unmeasured confounding bias. By contrast, as a consequence of randomization, experimental data yield unbiased causal effects. Yet experiments are costly, often involve relatively few units, and may incorporate stringent inclusion criteria that make the studied populations somewhat artificial. A challenge for researchers is how to integrate these two types of data to leverage their respective virtues. Over roughly the past 5 years, many novel approaches have been proposed. As in this review, we restrict our focus to techniques for integrating individual‐level experimental and observational data, without assuming all confounding variables are studied in the observational data. We first “locate” the problem by detailing important considerations from the causal inference and transportability literature. We next discuss three important research traditions that predate modern methodological work: meta‐analysis, Empirical Bayes shrinkage, and historical borrowing. In organizing the growing literature on data‐combination methods, we use a categorization involving five distinct approaches: auxiliary methods, control‐arm augmentation, debiasing, test‐then‐merge, and weighting. Within each category, we summarize recently proposed methodologies, highlighting the strengths and weaknesses of each. We conclude with a discussion of how practitioners might choose between competing approaches when conducting applied work.
more »
« less