skip to main content

Title: On parameter estimation with the Wasserstein distance

Statistical inference can be performed by minimizing, over the parameter space, the Wasserstein distance between model distributions and the empirical distribution of the data. We study asymptotic properties of such minimum Wasserstein distance estimators, complementing results derived by Bassetti, Bodini and Regazzini in 2006. In particular, our results cover the misspecified setting, in which the data-generating process is not assumed to be part of the family of distributions described by the model. Our results are motivated by recent applications of minimum Wasserstein estimators to complex generative models. We discuss some difficulties arising in the numerical approximation of these estimators. Two of our numerical examples ($g$-and-$\kappa$ and sum of log-normals) are taken from the literature on approximate Bayesian computation and have likelihood functions that are not analytically tractable. Two other examples involve misspecified models.

more » « less
Award ID(s):
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
Oxford University Press
Date Published:
Journal Name:
Information and Inference: A Journal of the IMA
Page Range / eLocation ID:
p. 657-676
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    A growing number of generative statistical models do not permit the numerical evaluation of their likelihood functions. Approximate Bayesian computation has become a popular approach to overcome this issue, in which one simulates synthetic data sets given parameters and compares summaries of these data sets with the corresponding observed values. We propose to avoid the use of summaries and the ensuing loss of information by instead using the Wasserstein distance between the empirical distributions of the observed and synthetic data. This generalizes the well-known approach of using order statistics within approximate Bayesian computation to arbitrary dimensions. We describe how recently developed approximations of the Wasserstein distance allow the method to scale to realistic data sizes, and we propose a new distance based on the Hilbert space filling curve. We provide a theoretical study of the method proposed, describing consistency as the threshold goes to 0 while the observations are kept fixed, and concentration properties as the number of observations grows. Various extensions to time series data are discussed. The approach is illustrated on various examples, including univariate and multivariate g-and-k distributions, a toggle switch model from systems biology, a queuing model and a Lévy-driven stochastic volatility model.

    more » « less
  2. Summary

    We consider the situation where the random effects in a generalized linear mixed model may be correlated with one of the predictors, which leads to inconsistent estimators. We show that conditional maximum likelihood can eliminate this bias. Conditional likelihood leads naturally to the partitioning of the covariate into between- and within-cluster components and models that include separate terms for these components also eliminate the source of the bias. Another viewpoint that we develop is the idea that many violations of the assumptions (including correlation between the random effects and a covariate) in a generalized linear mixed model may be cast as misspecified mixing distributions. We illustrate the results with two examples and simulations.

    more » « less
  3. Summary

    Human rights data presents challenges for capture–recapture methodology. Lists of violent acts provided by many different groups create large, sparse tables of data for which saturated models are difficult to fit and for which simple models may be misspecified. We analyze data on killings and disappearances in Casanare, Colombia during years 1998 to 2007. Our estimates differ whether we choose to model marginal reporting probabilities and odds ratios, versus modeling the full reporting pattern in a conditional (log-linear) model. With 2629 observed killings, a marginal model we consider estimates over 9000 killings, while conditional models we consider estimate 6000–7000 killings. The latter agree with previous estimates, also from a conditional model. We see a twofold difference between the high sample coverage estimate of over 10,000 killings and low sample coverage lower bound estimate of 5200 killings. We use a simulation study to compare marginal and conditional models with at most two-way interactions and sample coverage estimators. The simulation results together with model selection criteria lead us to believe the previous estimates of total killings in Casanare may have been biased downward, suggesting that the violence was worse than previously thought. Model specification is an important consideration when interpreting population estimates from capture recapture analysis and the Casanare data is a protypical example of how that manifests.

    more » « less
  4. Motivated by problems in data clustering, we establish general conditions under which families of nonparametric mixture models are identifiable, by introducing a novel framework involving clustering overfitted parametric (i.e. misspecified) mixture models. These identifiability conditions generalize existing conditions in the literature, and are flexible enough to include for example mixtures of Gaussian mixtures. In contrast to the recent literature on estimating nonparametric mixtures, we allow for general nonparametric mixture components, and instead impose regularity assumptions on the underlying mixing measure. As our primary application, we apply these results to partition-based clustering, generalizing the notion of a Bayes optimal partition from classical parametric model-based clustering to nonparametric settings. Furthermore, this framework is constructive so that it yields a practical algorithm for learning identified mixtures, which is illustrated through several examples on real data. The key conceptual device in the analysis is the convex, metric geometry of probability measures on metric spaces and its connection to the Wasserstein convergence of mixing measures. The result is a flexible framework for nonparametric clustering with formal consistency guarantees. 
    more » « less
  5. Abstract

    Robust estimation is an important problem in statistics which aims at providing a reasonable estimator when the data-generating distribution lies within an appropriately defined ball around an uncontaminated distribution. Although minimax rates of estimation have been established in recent years, many existing robust estimators with provably optimal convergence rates are also computationally intractable. In this paper, we study several estimation problems under a Wasserstein contamination model and present computationally tractable estimators motivated by generative adversarial networks (GANs). Specifically, we analyze the properties of Wasserstein GAN-based estimators for location estimation, covariance matrix estimation and linear regression and show that our proposed estimators are minimax optimal in many scenarios. Finally, we present numerical results which demonstrate the effectiveness of our estimators.

    more » « less