skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Exact convergence analysis for Metropolis–Hastings independence samplers in Wasserstein distances
Abstract Under mild assumptions, we show that the exact convergence rate in total variation is also exact in weaker Wasserstein distances for the Metropolis–Hastings independence sampler. We develop a new upper and lower bound on the worst-case Wasserstein distance when initialized from points. For an arbitrary point initialization, we show that the convergence rate is the same and matches the convergence rate in total variation. We derive exact convergence expressions for more general Wasserstein distances when initialization is at a specific point. Using optimization, we construct a novel centered independent proposal to develop exact convergence rates in Bayesian quantile regression and many generalized linear model settings. We show that the exact convergence rate can be upper bounded in Bayesian binary response regression (e.g. logistic and probit) when the sample size and dimension grow together.  more » « less
Award ID(s):
2152746
PAR ID:
10511366
Author(s) / Creator(s):
;
Publisher / Repository:
Journal of Applied Probability
Date Published:
Journal Name:
Journal of Applied Probability
Volume:
61
Issue:
1
ISSN:
0021-9002
Page Range / eLocation ID:
33 to 54
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The optimal transport barycenter (a.k.a. Wasserstein barycenter) is a fundamental notion of averaging that extends from the Euclidean space to the Wasserstein space of probability distributions. Computation of the unregularized barycenter for discretized probability distributions on point clouds is a challenging task when the domain dimension d>1. Most practical algorithms for approximating the barycenter problem are based on entropic regularization. In this paper, we introduce a nearly linear time O(mlogm) and linear space complexity O(m) primal-dual algorithm, the Wasserstein-Descent ℍ˙1-Ascent (WDHA) algorithm, for computing the exact barycenter when the input probability density functions are discretized on an m-point grid. The key success of the WDHA algorithm hinges on alternating between two different yet closely related Wasserstein and Sobolev optimization geometries for the primal barycenter and dual Kantorovich potential subproblems. Under reasonable assumptions, we establish the convergence rate and iteration complexity of WDHA to its stationary point when the step size is appropriately chosen. Superior computational efficacy, scalability, and accuracy over the existing Sinkhorn-type algorithms are demonstrated on high-resolution (e.g., 1024×1024 images) 2D synthetic and real data. 
    more » « less
  2. Summary We study quantile trend filtering, a recently proposed method for nonparametric quantile regression, with the goal of generalizing existing risk bounds for the usual trend-filtering estimators that perform mean regression. We study both the penalized and the constrained versions, of order $$r \geqslant 1$$, of univariate quantile trend filtering. Our results show that both the constrained and the penalized versions of order $$r \geqslant 1$$ attain the minimax rate up to logarithmic factors, when the $(r-1)$th discrete derivative of the true vector of quantiles belongs to the class of bounded-variation signals. Moreover, we show that if the true vector of quantiles is a discrete spline with a few polynomial pieces, then both versions attain a near-parametric rate of convergence. Corresponding results for the usual trend-filtering estimators are known to hold only when the errors are sub-Gaussian. In contrast, our risk bounds are shown to hold under minimal assumptions on the error variables. In particular, no moment assumptions are needed and our results hold under heavy-tailed errors. Our proof techniques are general, and thus can potentially be used to study other nonparametric quantile regression methods. To illustrate this generality, we employ our proof techniques to obtain new results for multivariate quantile total-variation denoising and high-dimensional quantile linear regression. 
    more » « less
  3. We propose and analyze a new stochastic gradient method, which we call Stochastic Unbiased Curvature-aided Gradient (SUCAG), for finite sum optimization problems. SUCAG constitutes an unbiased total gradient tracking technique that uses Hessian information to accelerate convergence. We analyze our method under the general asynchronous model of computation, in which each function is selected infinitely often with possibly unbounded (but sublinear) delay. For strongly convex problems, we establish linear convergence for the SUCAG method. When the initialization point is sufficiently close to the optimal solution, the established convergence rate is only dependent on the condition number of the problem, making it strictly faster than the known rate for the SAGA method. Furthermore, we describe a Markov-driven approach of implementing the SUCAG method in a distributed asynchronous multi-agent setting, via gossiping along a random walk on an undirected communication graph. We show that our analysis applies as long as the graph is connected and, notably, establishes an asymptotic linear convergence rate that is robust to the graph topology. Numerical results demonstrate the merits of our algorithm over existing methods. 
    more » « less
  4. null (Ed.)
    Sensitivity properties describe how changes to the input of a program affect the output, typically by upper bounding the distance between the outputs of two runs by a monotone function of the distance between the corresponding inputs. When programs are probabilistic, the distance between outputs is a distance between distributions. The Kantorovich lifting provides a general way of defining a distance between distributions by lifting the distance of the underlying sample space; by choosing an appropriate distance on the base space, one can recover other usual probabilistic distances, such as the Total Variation distance. We develop a relational pre-expectation calculus to upper bound the Kantorovich distance between two executions of a probabilistic program. We illustrate our methods by proving algorithmic stability of a machine learning algorithm, convergence of a reinforcement learning algorithm, and fast mixing for card shuffling algorithms. We also consider some extensions: using our calculus to show convergence of Markov chains to the uniform distribution over states and an asynchronous extension to reason about pairs of program executions with different control flow. 
    more » « less
  5. Abstract We consider the nonparametric estimation of an S-shaped regression function. The least squares estimator provides a very natural, tuning-free approach, but results in a non-convex optimization problem, since the inflection point is unknown. We show that the estimator may nevertheless be regarded as a projection onto a finite union of convex cones, which allows us to propose a mixed primal-dual bases algorithm for its efficient, sequential computation. After developing a projection framework that demonstrates the consistency and robustness to misspecification of the estimator, our main theoretical results provide sharp oracle inequalities that yield worst-case and adaptive risk bounds for the estimation of the regression function, as well as a rate of convergence for the estimation of the inflection point. These results reveal not only that the estimator achieves the minimax optimal rate of convergence for both the estimation of the regression function and its inflection point (up to a logarithmic factor in the latter case), but also that it is able to achieve an almost-parametric rate when the true regression function is piecewise affine with not too many affine pieces. Simulations and a real data application to air pollution modelling also confirm the desirable finite-sample properties of the estimator, and our algorithm is implemented in the R package Sshaped. 
    more » « less