Analysis of the differential treatment effects across targeted subgroups and contexts is a critical objective in many evaluations because it delineates for whom and under what conditions particular programs, therapies or treatments are effective. Unfortunately, it is unclear how to plan efficient and effective evaluations that include these moderated effects when the design includes partial nesting (i.e., disparate grouping structures across treatment conditions). In this study, we develop statistical power formulas to identify requisite sample sizes and guide the planning of evaluations probing moderation under two-level partially nested designs. The results suggest that the power to detect moderation effects in partially nested designs is substantially influenced by sample size, moderation effect size, and moderator variance structure (i.e., varies within groups only or within and between groups). We implement the power formulas in the R-Shiny application PowerUpRShiny and demonstrate their use to plan evaluations.
- Award ID(s):
- 1913563
- NSF-PAR ID:
- 10391160
- Date Published:
- Journal Name:
- Adıyaman Üniversitesi Eğitim Bilimleri Dergisi
- ISSN:
- 2149-2727
- Page Range / eLocation ID:
- 42 to 55
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Summary In many applications of regression discontinuity designs, the running variable used to assign treatment is only observed with error. We show that, provided the observed running variable (i) correctly classifies treatment assignment and (ii) affects the conditional means of potential outcomes smoothly, ignoring the measurement error nonetheless yields an estimate with a causal interpretation: the average treatment effect for units whose observed running variable equals the cutoff. Possibly after doughnut trimming, these assumptions accommodate a variety of settings where support of the measurement error is not too wide. An empirical application illustrates the results for both sharp and fuzzy designs.
-
Summary Power analyses are an important aspect of experimental design, because they help determine how experiments are implemented in practice. It is common to specify a desired level of power and compute the sample size necessary to obtain that power. Such calculations are well known for completely randomized experiments, but there can be many benefits to using other experimental designs. For example, it has recently been established that rerandomization, where subjects are randomized until covariate balance is obtained, increases the precision of causal effect estimators. This work establishes the power of rerandomized treatment-control experiments, thereby allowing for sample size calculators. We find the surprising result that, while power is often greater under rerandomization than complete randomization, the opposite can occur for very small treatment effects. The reason is that inference under rerandomization can be relatively more conservative, in the sense that it can have a lower Type-I error at the same nominal significance level, and this additional conservativeness adversely affects power. This surprising result is due to treatment effect heterogeneity, a quantity often ignored in power analyses. We find that heterogeneity increases power for large effect sizes, but decreases power for small effect sizes.
-
Differential item functioning (DIF) is often used to examine validity evidence of alternate form test accommodations. Unfortunately, traditional approaches for evaluating DIF are prone to selection bias. This article proposes a novel DIF framework that capitalizes on regression discontinuity design analysis to control for selection bias. A simulation study was performed to compare the new framework with traditional logistic regression, with respect to Type I error and power rates of the uniform DIF test statistics and bias and root mean square error of the corresponding effect size estimators. The new framework better controlled the Type I error rate and demonstrated minimal bias but suffered from low power and lack of precision. Implications for practice are discussed.more » « less
-
Abstract Despite significant investments in watershed‐scale restoration projects, evaluation of their impacts on salmonids is often limited by inadequate experimental design. This project aimed to strengthen study designs by identifying and quantifying sources of temporal and spatial uncertainty while assessing population‐level salmonid responses in Before‐After‐Control‐Impact (BACI) restoration experiments. To evaluate sources of temporal uncertainty, meta‐analysis of 32 annual BACI experiments from the Pacific Northwest, USA was conducted. Experimental error was determined to be a function of the total temporal variation of both restoration and control salmonid population metrics and the degree of covariation, or synchrony, between these metrics (
r 2 = 1). However, synchrony was both weak ( = 0.18) and unrelated to experimental error (r = 0.01) while temporal variability was found to account for 91% of this error. Because synchrony did not reduce experimental error, we conclude that BACI designs will not normally exhibit greater power over uncontrolled Before‐After (BA) designs. To evaluate spatial uncertainty, hierarchical BACI designs were simulated. It was found that spatial variability of hypothetical steelhead (Oncorhynchus mykiss ) growth values within watersheds can cause mis‐estimation of the restoration effect and reduce power. While hierarchical BACI designs can examine both reach and watershed‐scale restoration effects simultaneously, due to probable mis‐estimation of the restoration effect size, these scales should be examined separately. Paired‐reach designs such as Extensive Post‐Treatment (EPT) provide powerful replicated local‐scale restoration experiments, which can build understanding of restoration‐ecological mechanisms. Knowledge gained from reach‐scale experiments should then be implemented on watershed‐scales and monitored within a non‐hierarchical framework.