Summary A general framework is set up to study the asymptotic properties of the intent-to-treat Wilcoxon–Mann–Whitney test in randomized experiments with nonignorable noncompliance. Under location-shift alternatives, the Pitman efficiencies of the intent-to-treat Wilcoxon–Mann–Whitney and $$t$$ tests are derived. It is shown that the former is superior if the compliers are more likely to be found in high-density regions of the outcome distribution or, equivalently, if the noncompliers tend to reside in the tails. By logical extension, the relative efficiency of the two tests is sharply bounded by least and most favourable scenarios in which the compliers are segregated into regions of lowest and highest density, respectively. Such bounds can be derived analytically as a function of the compliance rate for common location families such as Gaussian, Laplace, logistic and $$t$$ distributions. These results can help empirical researchers choose the more efficient test for existing data, and calculate sample size for future trials in anticipation of noncompliance. Results for nonadditive alternatives and other tests follow along similar lines.
more »
« less
On the power of popular two-sample tests applied to precipitation and discharge series
Two-sample tests are widely used in hydrologic and climate studies to investigate whether two samples of a variable of interest could be considered drawn from different populations. Despite this, the information on the power (i.e., the probability of correctly rejecting the null hypothesis) of these tests applied to hydroclimatic variables is limited. Here, this need is addressed considering four popular two-sample tests applied to daily and extreme precipitation, and annual peak flow series. The chosen tests assess differences in location (t-Student and Wilcoxon) and distribution (Kolmogorov–Smirnov and likelihood-ratio). The power was quantified through Monte Carlo simulations relying on pairs of realistic samples of the three variables with equal size, generated with a procedure based on suitable parametric distributions and copulas. After showing that differences in sample skewness are monotonically related to differences in spread, power surfaces were built as a function of the relative changes in location and spread of the samples and utilized to interpret three case studies comparing samples of observed precipitation and discharge series in the U.S. It was found that (1) the t-Student applied to the log-transformed samples has the same power as the Wilcoxon test; (2) location (distribution) tests perform better than distribution (location) tests for small (moderate-to-large) differences in spread and skewness; (3) the power is relatively lower (higher) if the differences in location and spread or skewness have concordant (discordant) sign; and (4) the power increases with the sample size but could be quite low for tests applied to extreme precipitation and discharge records that are commonly short. This work provides useful recommendations for selecting and interpreting two-sample tests in a broad range of hydroclimatic applications.
more »
« less
- PAR ID:
- 10524680
- Publisher / Repository:
- Springer Link
- Date Published:
- Journal Name:
- Stochastic Environmental Research and Risk Assessment
- Volume:
- 38
- Issue:
- 7
- ISSN:
- 1436-3240
- Page Range / eLocation ID:
- 2747 to 2765
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
A number of information retrieval studies have been done to assess which statistical techniques are appropriate for comparing systems. However, these studies are focused on TREC-style experiments, which typically have fewer than 100 topics. There is no similar line of work for large search and recommendation experiments; such studies typically have thousands of topics or users and much sparser relevance judgements, so it is not clear if recommendations for analyzing traditional TREC experiments apply to these settings. In this paper, we empirically study the behavior of significance tests with large search and recommendation evaluation data. Our results show that the Wilcoxon and Sign tests show significantly higher Type-1 error rates for large sample sizes than the bootstrap, randomization and t-tests, which were more consistent with the expected error rate. While the statistical tests displayed differences in their power for smaller sample sizes, they showed no difference in their power for large sample sizes. We recommend the sign and Wilcoxon tests should not be used to analyze large scale evaluation results. Our result demonstrate that with Top-\(N\) recommendation and large search evaluation data, most tests would have a 100% chance of finding statistically significant results. Therefore, the effect size should be used to determine practical or scientific significance.more » « less
-
Hypothesis tests are a crucial statistical tool for data mining and are the workhorse of scientific research in many fields. Here we study differentially private tests of independence between a categorical and a continuous variable. We take as our starting point traditional nonparametric tests, which require no distributional assumption (e.g., normality) about the data distribution. We present private analogues of the Kruskal-Wallis, Mann-Whitney, and Wilcoxon signed-rank tests, as well as the parametric one-sample t-test. These tests use novel test statistics developed specifically for the private setting. We compare our tests to prior work, both on parametric and nonparametric tests. We find that in all cases our new nonparametric tests achieve large improvements in statistical power, even when the assumptions of parametric tests are met.more » « less
-
ABSTRACT The Northeast United States exhibits significant spatial heterogeneity in flood seasonality, with spring snowmelt‐driven floods historically dominating northern areas, while other regions show more varied flood seasonality. While it is well documented that since 1996 there has been a marked increase in extreme precipitation across this region, the response of flood seasonality to these changes in extreme precipitation and the spatial distribution of these effects remain uncertain. Here we show that, historically, snowmelt‐dominated northern regions were relatively insensitive to changes in extreme precipitation. However, with climate warming, the dominance of snowmelt floods is decreasing and thus the extreme flood regimes in northern regions are increasingly susceptible to changes in extreme precipitation. While extreme precipitation increased everywhere in the Northeastern United States in 1996, it has since returned to near pre‐1996 levels in the coastal north while remaining elevated in the inland north. Thus, the inland north region has and continues to experience the greatest changes in extreme flooding seasonality, including a substantial rise in floods outside the historical spring flood season, particularly in smaller watersheds. Further analysis reveals that while early winter floods are increasingly common, the magnitude of cold season floods (Nov‐May) have remained unchanged over time. In contrast, warm season floods (June‐Oct), historically less significant, are now increasing in both frequency and magnitude in the inland north. Our results highlight that treating the entire Northeast as a uniform hydroclimatic region conceals significant regional variations in extreme discharge trends and, more generally, climate warming will likely increase the sensitivity of historically snowmelt dominated watersheds to extreme precipitation. Understanding this spatial variability in increased extreme precipitation and increased sensitivity to extreme precipitation is crucial for enhancing disaster preparedness and refining water management strategies in affected regions.more » « less
-
Extreme, downslope mountain winds often generate dangerous wildfire conditions. We used the wildfire spread model Fire Area Simulator (FARSITE) to simulate two wildfires influenced by strong wind events in Santa Barbara, CA. High spatial-resolution imagery for fuel maps and hourly wind downscaled to 100 m were used as model inputs, and sensitivity tests were performed to evaluate the effects of ignition timing and location on fire spread. Additionally, burn area rasters from FARSITE simulations were compared to minimum travel time rasters from FlamMap simulations, a wildfire model similar to FARSITE that holds environmental variables constant. Utilization of two case studies during strong winds revealed that FARSITE was able to successfully reconstruct the spread rate and size of wildfires when spotting was minimal. However, in situations when spotting was an important factor in rapid downslope wildfire spread, both FARSITE and FlamMap were unable to simulate realistic fire perimeters. We show that this is due to inherent limitations in the models themselves, related to the slope-orientation relative to the simulated fire spread, and the dependence of ember launch and land locations. This finding has widespread implications, given the role of spotting in fire progression during extreme wind events.more » « less
An official website of the United States government

