Statistical divergences (SDs), which quantify the dissimilarity between probability distributions, are a basic constituent of statistical inference and machine learning. A modern method for estimating those divergences relies on parametrizing an empirical variational form by a neural network (NN) and optimizing over parameter space. Such neural estimators are abundantly used in practice, but corresponding performance guarantees are partial and call for further exploration. We establish non-asymptotic absolute error bounds for a neural estimator realized by a shallow NN, focusing on four popular đż-divergences---Kullback-Leibler, chi-squared, squared Hellinger, and total variation. Our analysis relies on non-asymptotic function approximation theorems and tools from empirical process theory to bound the two sources of error involved: function approximation and empirical estimation. The bounds characterize the effective error in terms of NN size and the number of samples, and reveal scaling rates that ensure consistency. For compactly supported distributions, we further show that neural estimators of the first three divergences above with appropriate NN growth-rate are minimax rate-optimal, achieving the parametric convergence rate.
more »
« less
Sample Out-of-Sample Inference Based on Wasserstein Distance
We present a novel inference approach that we call sample out-of-sample inference. The approach can be used widely, ranging from semisupervised learning to stress testing, and it is fundamental in the application of data-driven distributionally robust optimization. Our method enables measuring the impact of plausible out-of-sample scenarios in a given performance measure of interest, such as a financial loss. The methodology is inspired by empirical likelihood (EL), but we optimize the empirical Wasserstein distance (instead of the empirical likelihood) induced by observations. From a methodological standpoint, our analysis of the asymptotic behavior of the induced Wasserstein-distance profile function shows dramatic qualitative differences relative to EL. For instance, in contrast to EL, which typically yields chi-squared weak convergence limits, our asymptotic distributions are often not chi-squared. Also, the rates of convergence that we obtain have some dependence on the dimension in a nontrivial way but remain controlled as the dimension increases.
more »
« less
- PAR ID:
- 10303857
- Date Published:
- Journal Name:
- Operations Research
- Volume:
- 69
- Issue:
- 3
- ISSN:
- 0030-364X
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Statistical inference can be performed by minimizing, over the parameter space, the Wasserstein distance between model distributions and the empirical distribution of the data. We study asymptotic properties of such minimum Wasserstein distance estimators, complementing results derived by Bassetti, Bodini and Regazzini in 2006. In particular, our results cover the misspecified setting, in which the data-generating process is not assumed to be part of the family of distributions described by the model. Our results are motivated by recent applications of minimum Wasserstein estimators to complex generative models. We discuss some difficulties arising in the numerical approximation of these estimators. Two of our numerical examples ($$g$$-and-$$\kappa$$ and sum of log-normals) are taken from the literature on approximate Bayesian computation and have likelihood functions that are not analytically tractable. Two other examples involve misspecified models.more » « less
-
null (Ed.)Abstract Wilkâs theorem, which offers universal chi-squared approximations for likelihood ratio tests, is widely used in many scientific hypothesis testing problems. For modern datasets with increasing dimension, researchers have found that the conventional Wilkâs phenomenon of the likelihood ratio test statistic often fails. Although new approximations have been proposed in high dimensional settings, there still lacks a clear statistical guideline regarding how to choose between the conventional and newly proposed approximations, especially for moderate-dimensional data. To address this issue, we develop the necessary and sufficient phase transition conditions for Wilkâs phenomenon under popular tests on multivariate mean and covariance structures. Moreover, we provide an in-depth analysis of the accuracy of chi-squared approximations by deriving their asymptotic biases. These results may provide helpful insights into the use of chi-squared approximations in scientific practices.more » « less
-
A minimum Wasserstein distance approach to Fisher's combination of independent, discrete p âvaluesABSTRACT This article introduces a comprehensive framework to adjust a discrete test statistic for improving its hypothesis testing procedure. The adjustment minimizes the Wasserstein distance to a nullâapproximating continuous distribution, tackling some fundamental challenges inherent in combining statistical significances derived from discrete distributions. The related theory justifies Lancaster's midâp and meanâvalue chiâsquared statistics for Fisher's combination as special cases. To counter the conservative nature of Lancaster's testing procedures, we propose an updated nullâapproximating distribution. It is achieved by further minimizing the Wasserstein distance to the adjusted statistics within an appropriate distribution family. Specifically, in the context of Fisher's combination, we propose an optimal gamma distribution as a substitute for the traditionally used chiâsquared distribution. This new approach yields an asymptotically consistent test that significantly improves Type I error control and enhances statistical power.more » « less
-
null (Ed.)Summary This paper is concerned with empirical likelihood inference on the population mean when the dimension $$p$$ and the sample size $$n$$ satisfy $$p/n\rightarrow c\in [1,\infty)$$. As shown in Tsao (2004), the empirical likelihood method fails with high probability when $p/n>1/2$ because the convex hull of the $$n$$ observations in $$\mathbb{R}^p$$ becomes too small to cover the true mean value. Moreover, when $p> n$, the sample covariance matrix becomes singular, and this results in the breakdown of the first sandwich approximation for the log empirical likelihood ratio. To deal with these two challenges, we propose a new strategy of adding two artificial data points to the observed data. We establish the asymptotic normality of the proposed empirical likelihood ratio test. The proposed test statistic does not involve the inverse of the sample covariance matrix. Furthermore, its form is explicit, so the test can easily be carried out with low computational cost. Our numerical comparison shows that the proposed test outperforms some existing tests for high-dimensional mean vectors in terms of power. We also illustrate the proposed procedure with an empirical analysis of stock data.more » « less
An official website of the United States government

