ABSTRACT This article introduces a comprehensive framework to adjust a discrete test statistic for improving its hypothesis testing procedure. The adjustment minimizes the Wasserstein distance to a null‐approximating continuous distribution, tackling some fundamental challenges inherent in combining statistical significances derived from discrete distributions. The related theory justifies Lancaster's mid‐p and mean‐value chi‐squared statistics for Fisher's combination as special cases. To counter the conservative nature of Lancaster's testing procedures, we propose an updated null‐approximating distribution. It is achieved by further minimizing the Wasserstein distance to the adjusted statistics within an appropriate distribution family. Specifically, in the context of Fisher's combination, we propose an optimal gamma distribution as a substitute for the traditionally used chi‐squared distribution. This new approach yields an asymptotically consistent test that significantly improves Type I error control and enhances statistical power.
more »
« less
Earth mover’s distance as a measure of CP violation
We introduce a new unbinned two sample test statistic sensitive to CP violation utilizing the optimal transport plan associated with the Wasserstein (earth mover’s) distance. The efficacy of the test statistic is shown via two examples of CP asymmetric distributions with varying sample sizes: the Dalitz distributions of B0 → K+π−π0 and of D0 → π+π−π0 decays. The windowed version of the Wasserstein distance test statistic is shown to have comparable sensitivity to CP violation as the commonly used energy test statistic, but also retains information about the localized distributions of CP asymmetry over the Dalitz plot. For large statistic datasets we introduce two modified Wasserstein distance based test statistics — the binned and the sliced Wasserstein distance statistics, which show comparable sensitivity to CP violation, but improved computing time and memory scalings. Finally, general extensions and applications of the introduced statistics are discussed.
more »
« less
- Award ID(s):
- 2103889
- PAR ID:
- 10440880
- Publisher / Repository:
- arxiv.org
- Date Published:
- Journal Name:
- Journal of High Energy Physics
- Volume:
- 2023
- Issue:
- 6
- ISSN:
- 1029-8479
- Page Range / eLocation ID:
- 98
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract $$B^\pm \rightarrow DK^\pm $$ transitions are known to provide theoretically clean information about the CKM angle$$\gamma $$ , with the most precise available methods exploiting the cascade decay of the neutralDintoCPself-conjugate states. Such analyses currently require binning in theDdecay Dalitz plot, while a recently proposed method replaces this binning with the truncation of a Fourier series expansion. In this paper, we present a proof of principle of a novel alternative to these two methods, in which no approximations at the level of the data representation are required. In particular, our new strategy makes no assumptions about the amplitude and strong phase variation over the Dalitz plot. This comes at the cost of a degree of ambiguity in the choice of test statistic quantifying the compatibility of the data with a given value of$$\gamma $$ , with improved choices of test statistic yielding higher sensitivity. While our current proof-of-principle implementation does not demonstrate optimal sensitivity to$$\gamma $$ , its conceptually novel approach opens the door to new strategies for$$\gamma $$ extraction. More studies are required to see if these can be competitive with the existing methods.more » « less
-
Abstract Testing the homogeneity between two samples of functional data is an important task. While this is feasible for intensely measured functional data, we explain why it is challenging for sparsely measured functional data and show what can be done for such data. In particular, we show that testing the marginal homogeneity based on point-wise distributions is feasible under some mild constraints and propose a new two-sample statistic that works well with both intensively and sparsely measured functional data. The proposed test statistic is formulated upon energy distance, and the convergence rate of the test statistic to its population version is derived along with the consistency of the associated permutation test. The aptness of our method is demonstrated on both synthetic and real data sets.more » « less
-
Given a random sample of size n from a p dimensional random vector, we are interested in testing whether the p components of the random vector are mutually independent. This is the so-called complete independence test. In the multivariate normal case, it is equivalent to testing whether the correlation matrix is an identity matrix. In this paper, we propose a one-sided empirical likelihood method for the complete independence test based on squared sample correlation coefficients. The limiting distribution for our one-sided empirical likelihood test statistic is proved to be Z^2I(Z > 0) when both n and p tend to infinity, where Z is a standard normal random variable. In order to improve the power of the empirical likelihood test statistic, we also introduce a rescaled empirical likelihood test statistic. We carry out an extensive simulation study to compare the performance of the rescaled empirical likelihood method and two other statistics.more » « less
-
Two-sample testing is a fundamental problem in statistics. While many powerful nonparametric methods exist for both the univariate and multivariate context, it is comparatively less common to see a framework for determining which data features lead to rejection of the null. In this paper, we propose a new nonparametric two-sample test named AUGUST, which incorporates a framework for interpretation while maintaining power comparable to existing methods. AUGUST tests for inequality in distribution up to a predetermined resolution using symmetry statistics from binary expansion. Designed for univariate and low to moderate-dimensional multivariate data, this construction allows us to understand distributional differences as a combination of fundamental orthogonal signals. Asymptotic theory for the test statistic facilitates p-value computation and power analysis, and an efficient algorithm enables computation on large data sets. In empirical studies, we show that our test has power comparable to that of popular existing methods, as well as greater power in some circumstances. We illustrate the interpretability of our method using NBA shooting data.more » « less
An official website of the United States government

