The regression discontinuity (RD) design is one of the most widely used nonexperimental methods for causal inference and program evaluation. Over the last two decades, statistical and econometric methods for RD analysis have expanded and matured, and there is now a large number of methodological results for RD identification, estimation, inference, and validation. We offer a curated review of this methodological literature organized around the two most popular frameworks for the analysis and interpretation of RD designs: the continuity framework and the local randomization framework. For each framework, we discuss three main topics: ( a) designs and parameters, focusing on different types of RD settings and treatment effects of interest; ( b) estimation and inference, presenting the most popular methods based on local polynomial regression and methods for the analysis of experiments, as well as refinements, extensions, and alternatives; and ( c) validation and falsification, summarizing an array of mostly empirical approaches to support the validity of RD designs in practice.
more »
« less
A guide to regression discontinuity designs in medical applications
We present a practical guide for the analysis of regression discontinuity (RD) designs in biomedical contexts. We begin by introducing key concepts, assumptions, and estimands within both the continuity‐based framework and the local randomization framework. We then discuss modern estimation and inference methods within both frameworks, including approaches for bandwidth or local neighborhood selection, optimal treatment effect point estimation, and robust bias‐corrected inference methods for uncertainty quantification. We also overview empirical falsification tests that can be used to support key assumptions. Our discussion focuses on two particular features that are relevant in biomedical research: (i) fuzzy RD designs, which often arise when therapeutic treatments are based on clinical guidelines, but patients with scores near the cutoff are treated contrary to the assignment rule; and (ii) RD designs with discrete scores, which are ubiquitous in biomedical applications. We illustrate our discussion with three empirical applications: the effect CD4 guidelines for anti‐retroviral therapy on retention of HIV patients in South Africa, the effect of genetic guidelines for chemotherapy on breast cancer recurrence in the United States, and the effects of age‐based patient cost‐sharing on healthcare utilization in Taiwan. Complete replication materials employing publicly available data and statistical software inPython,RandStataare provided, offering researchers all necessary tools to conduct an RD analysis.
more »
« less
- PAR ID:
- 10505072
- Publisher / Repository:
- Wiley
- Date Published:
- Journal Name:
- Statistics in Medicine
- Volume:
- 42
- Issue:
- 24
- ISSN:
- 0277-6715
- Page Range / eLocation ID:
- 4484 to 4513
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Summary Modern empirical work in regression discontinuity (RD) designs often employs local polynomial estimation and inference with a mean square error (MSE) optimal bandwidth choice. This bandwidth yields an MSE-optimal RD treatment effect estimator, but is by construction invalid for inference. Robust bias-corrected (RBC) inference methods are valid when using the MSE-optimal bandwidth, but we show that they yield suboptimal confidence intervals in terms of coverage error. We establish valid coverage error expansions for RBC confidence interval estimators and use these results to propose new inference-optimal bandwidth choices for forming these intervals. We find that the standard MSE-optimal bandwidth for the RD point estimator is too large when the goal is to construct RBC confidence intervals with the smaller coverage error rate. We further optimize the constant terms behind the coverage error to derive new optimal choices for the auxiliary bandwidth required for RBC inference. Our expansions also establish that RBC inference yields higher-order refinements (relative to traditional undersmoothing) in the context of RD designs. Our main results cover sharp and sharp kink RD designs under conditional heteroskedasticity, and we discuss extensions to fuzzy and other RD designs, clustered sampling, and pre-intervention covariates adjustments. The theoretical findings are illustrated with a Monte Carlo experiment and an empirical application, and the main methodological results are available in R and Stata packages.more » « less
-
In this article, we introduce the Stata (and R) package rdmulti, which consists of three commands (rdmc, rdmcplot, rdms) for analyzing regression-discontinuity (RD) designs with multiple cutoffs or multiple scores. The command rdmc applies to noncumulative and cumulative multicutoff RD settings. It calculates pooled and cutoff-specific RD treatment effects and provides robust biascorrected inference procedures. Postestimation and inference is allowed. The command rdmcplot offers RD plots for multicutoff settings. Finally, the command rdms concerns multiscore settings, covering in particular cumulative cutoffs and two running variable contexts. It also calculates pooled and cutoff-specific RD treatment effects, provides robust bias-corrected inference procedures, and allows for postestimation and inference. These commands use the Stata (and R) package rdrobust for plotting, estimation, and inference. Companion R functions with the same syntax and capabilities are provided.more » « less
-
ObjectiveTo obtain the comprehensive transcriptome profile of human citrulline‐specific B cells from patients with rheumatoid arthritis (RA). MethodsCitrulline‐ and hemagglutinin‐specific B cells were sorted by flow cytometry using peptide–streptavidin conjugates from the peripheral blood ofRApatients and healthy individuals. The transcriptome profile of the sorted cells was obtained byRNA‐sequencing, and expression of key protein molecules was evaluated by aptamer‐basedSOMAscan assay and flow cytometry. The ability of these proteins to effect differentiation of osteoclasts and proliferation and migration of synoviocytes was examined by in vitro functional assays. ResultsCitrulline‐specific B cells, in comparison to citrulline‐negative B cells, from patients withRAdifferentially expressed the interleukin‐15 receptor α (IL‐15Rα) gene as well as genes related to protein citrullination and cyclicAMPsignaling. In analyses of an independent cohort of cyclic citrullinated peptide–seropositiveRApatients, the expression ofIL‐15Rα protein was enriched in citrulline‐specific B cells from the patients’ peripheral blood, and surprisingly, all B cells fromRApatients were capable of producing the epidermal growth factor ligand amphiregulin (AREG). Production ofAREGdirectly led to increased migration and proliferation of fibroblast‐like synoviocytes, and, in combination with anti–citrullinated protein antibodies, led to the increased differentiation of osteoclasts. ConclusionTo the best of our knowledge, this is the first study to document the whole transcriptome profile of autoreactive B cells in any autoimmune disease. These data identify several genes and pathways that may be targeted by repurposing severalUSFood and Drug Administration–approved drugs, and could serve as the foundation for the comparative assessment of B cell profiles in other autoimmune diseases.more » « less
-
Abstract Projects focused on movement behaviour and home range are commonplace, but beyond a focus on choosing appropriate research questions, there are no clear guidelines for such studies. Without these guidelines, designing an animal tracking study to produce reliable estimates of space‐use and movement properties (necessary to answer basic movement ecology questions), is often done in an ad hoc manner.We developed ‘movedesign’, a user‐friendly Shiny application, which can be utilized to investigate the precision of three estimates regularly reported in movement and spatial ecology studies: home range area, speed and distance travelled. Conceptually similar to statistical power analysis, this application enables users to assess the degree of estimate precision that may be achieved with a given sampling design; that is, the choices regarding data resolution (sampling interval) and battery life (sampling duration).Leveraging the ‘ctmm’Rpackage, we utilize two methods proven to handle many common biases in animal movement datasets: autocorrelated kernel density estimators (AKDEs) and continuous‐time speed and distance (CTSD) estimators. Longer sampling durations are required to reliably estimate home range areas via the detection of a sufficient number of home range crossings. In contrast, speed and distance estimation requires a sampling interval short enough to ensure that a statistically significant signature of the animal's velocity remains in the data.This application addresses key challenges faced by researchers when designing tracking studies, including the trade‐off between long battery life and high resolution of GPS locations collected by the devices, which may result in a compromise between reliably estimating home range or speed and distance. ‘movedesign’ has broad applications for researchers and decision‐makers, supporting them to focus efforts and resources in achieving the optimal sampling design strategy for their research questions, prioritizing the correct deployment decisions for insightful and reliable outputs, while understanding the trade‐off associated with these choices.more » « less
An official website of the United States government

