skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On the relative efficiency of the intent-to-treat Wilcoxon–Mann–Whitney test in the presence of noncompliance
Summary A general framework is set up to study the asymptotic properties of the intent-to-treat Wilcoxon–Mann–Whitney test in randomized experiments with nonignorable noncompliance. Under location-shift alternatives, the Pitman efficiencies of the intent-to-treat Wilcoxon–Mann–Whitney and $$t$$ tests are derived. It is shown that the former is superior if the compliers are more likely to be found in high-density regions of the outcome distribution or, equivalently, if the noncompliers tend to reside in the tails. By logical extension, the relative efficiency of the two tests is sharply bounded by least and most favourable scenarios in which the compliers are segregated into regions of lowest and highest density, respectively. Such bounds can be derived analytically as a function of the compliance rate for common location families such as Gaussian, Laplace, logistic and $$t$$ distributions. These results can help empirical researchers choose the more efficient test for existing data, and calculate sample size for future trials in anticipation of noncompliance. Results for nonadditive alternatives and other tests follow along similar lines.  more » « less
Award ID(s):
2015526
PAR ID:
10370217
Author(s) / Creator(s):
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Biometrika
Volume:
109
Issue:
3
ISSN:
0006-3444
Page Range / eLocation ID:
p. 873-880
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Hypothesis tests are a crucial statistical tool for data mining and are the workhorse of scientific research in many fields. Here we study differentially private tests of independence between a categorical and a continuous variable. We take as our starting point traditional nonparametric tests, which require no distributional assumption (e.g., normality) about the data distribution. We present private analogues of the Kruskal-Wallis, Mann-Whitney, and Wilcoxon signed-rank tests, as well as the parametric one-sample t-test. These tests use novel test statistics developed specifically for the private setting. We compare our tests to prior work, both on parametric and nonparametric tests. We find that in all cases our new nonparametric tests achieve large improvements in statistical power, even when the assumptions of parametric tests are met. 
    more » « less
  2. Two-sample tests are widely used in hydrologic and climate studies to investigate whether two samples of a variable of interest could be considered drawn from different populations. Despite this, the information on the power (i.e., the probability of correctly rejecting the null hypothesis) of these tests applied to hydroclimatic variables is limited. Here, this need is addressed considering four popular two-sample tests applied to daily and extreme precipitation, and annual peak flow series. The chosen tests assess differences in location (t-Student and Wilcoxon) and distribution (Kolmogorov–Smirnov and likelihood-ratio). The power was quantified through Monte Carlo simulations relying on pairs of realistic samples of the three variables with equal size, generated with a procedure based on suitable parametric distributions and copulas. After showing that differences in sample skewness are monotonically related to differences in spread, power surfaces were built as a function of the relative changes in location and spread of the samples and utilized to interpret three case studies comparing samples of observed precipitation and discharge series in the U.S. It was found that (1) the t-Student applied to the log-transformed samples has the same power as the Wilcoxon test; (2) location (distribution) tests perform better than distribution (location) tests for small (moderate-to-large) differences in spread and skewness; (3) the power is relatively lower (higher) if the differences in location and spread or skewness have concordant (discordant) sign; and (4) the power increases with the sample size but could be quite low for tests applied to extreme precipitation and discharge records that are commonly short. This work provides useful recommendations for selecting and interpreting two-sample tests in a broad range of hydroclimatic applications. 
    more » « less
  3. This paper illustrates how to calculate the moments and cumulants of the two-stage Mann-Whitney statistic. These results may be used to calculate the asymptotic critical values of the two-stage Mann-Whitney test. In this paper, a large amount of deductions will be showed. 
    more » « less
  4. null (Ed.)
    Food, energy and water (FEW) systems are critically stressed worldwide. These challenges require transformative science, engineering and policy solutions. However, cross-cutting solutions can only arise through transdisciplinary training of our future science and policy leaders. The University of Maryland Global STEWARDS National Science Foundation Research Traineeship seeks to meet these needs. This study assessed a foundational component of the program: a novel, experiential course focused on transdisciplinary training and communication skills. We drew on data from the first two offerings of the course and utilized a mixed-method, multi-informant evaluation that included validated pre–post surveys, individual interviews and focus groups. Paired Mann–Whitney–Wilcoxon tests were used to compare pre- and post-means. After the course, students reported improvements in their ability to identify strengths and weaknesses of multiple FEW nexus disciplines; articulate interplays between FEW systems at multiple scales; explain to peers the most important aspects of their research; and collaborate with scientists outside their field. Students also reported improvements in their oral and written communication skills, along with their ability to critically review others’ work. Our findings demonstrate that this graduate course can serve as an effective model to develop transdisciplinary researchers and communicators through cutting edge, experiential curricular approaches. 
    more » « less
  5. A number of information retrieval studies have been done to assess which statistical techniques are appropriate for comparing systems. However, these studies are focused on TREC-style experiments, which typically have fewer than 100 topics. There is no similar line of work for large search and recommendation experiments; such studies typically have thousands of topics or users and much sparser relevance judgements, so it is not clear if recommendations for analyzing traditional TREC experiments apply to these settings. In this paper, we empirically study the behavior of significance tests with large search and recommendation evaluation data. Our results show that the Wilcoxon and Sign tests show significantly higher Type-1 error rates for large sample sizes than the bootstrap, randomization and t-tests, which were more consistent with the expected error rate. While the statistical tests displayed differences in their power for smaller sample sizes, they showed no difference in their power for large sample sizes. We recommend the sign and Wilcoxon tests should not be used to analyze large scale evaluation results. Our result demonstrate that with Top-\(N\) recommendation and large search evaluation data, most tests would have a 100% chance of finding statistically significant results. Therefore, the effect size should be used to determine practical or scientific significance. 
    more » « less