skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Convergence Rates for Regularized Optimal Transport via Quantization
We study the convergence of divergence-regularized optimal transport as the regularization parameter vanishes. Sharp rates for general divergences including relative entropy or Lpregularization, general transport costs, and multimarginal problems are obtained. A novel methodology using quantization and martingale couplings is suitable for noncompact marginals and achieves, in particular, the sharp leading-order term of entropically regularized 2-Wasserstein distance for marginals with a finite [Formula: see text]-moment. Funding: This work was supported by the Alfred P. Sloan Foundation [Grant FG-2016-6282] and the Division of Mathematical Sciences [Grants DMS-1812661 and DMS-2106056].  more » « less
Award ID(s):
2106056 1812661
PAR ID:
10614220
Author(s) / Creator(s):
;
Publisher / Repository:
INFORMS
Date Published:
Journal Name:
Mathematics of Operations Research
Volume:
49
Issue:
2
ISSN:
0364-765X
Page Range / eLocation ID:
1223 to 1240
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Jonathan Berry, David Shmoys (Ed.)
    n 2013, Cuturi [9] introduced the SINKHORN algorithm for matrix scaling as a method to compute solutions to regularized optimal transport problems. In this paper, aiming at a better convergence rate for a high accuracy solution, we work on understanding the SINKHORN algorithm under regularization scheduling, and thus modify it with a mechanism that adaptively doubles the regularization parameter η periodically. We prove that such modified version of SINKHORN has an exponential convergence rate as iteration complexity depending on log(l/ɛ) instead of ɛ-o(1) from previous analyses [1, 9] in the optimal transport problems with integral supply and demand. Furthermore, with cost and capacity scaling procedures, the general optimal transport problem can be solved with a logarithmic dependence on 1/ɛ as well. 
    more » « less
  2. Wasserstein distance plays increasingly important roles in machine learning, stochastic programming and image processing. Major efforts have been under way to address its high computational complexity, some leading to approximate or regularized variations such as Sinkhorn distance. However, as we will demonstrate, regularized variations with large regularization parameter will degradate the performance in several important machine learning applications, and small regularization parameter will fail due to numerical stability issues with existing algorithms. We address this challenge by developing an Inexact Proximal point method for exact Optimal Transport problem (IPOT) with the proximal operator approximately evaluated at each iteration using projections to the probability simplex. The algorithm (a) converges to exact Wasserstein distance with theoretical guarantee and robust regularization parameter selection, (b) alleviates numerical stability issue, (c) has similar computational complexity to Sinkhorn, and (d) avoids the shrinking problem when apply to generative models. Furthermore, a new algorithm is proposed based on IPOT to obtain sharper Wasserstein barycenter. 
    more » « less
  3. Abstract We consider a regularization problem whose objective function consists of a convex fidelity term and a regularization term determined by the ℓ 1 norm composed with a linear transform. Empirical results show that the regularization with the ℓ 1 norm can promote sparsity of a regularized solution. The goal of this paper is to understand theoretically the effect of the regularization parameter on the sparsity of the regularized solutions. We establish a characterization of the sparsity under the transform matrix of the solution. When the objective function is block-separable or an error bound of the regularized solution to a known function is available, the resulting characterization can be taken as a regularization parameter choice strategy with which the regularization problem has a solution having a sparsity of a certain level. When the objective function is not block-separable, we propose an iterative algorithm which simultaneously determines the regularization parameter and its corresponding solution with a prescribed sparsity level. Moreover, we study choices of the regularization parameter so that the regularization term can alleviate the ill-posedness and promote sparsity of the resulting regularized solution. Numerical experiments demonstrate that the proposed algorithm is effective and efficient, and the choices of the regularization parameters can balance the sparsity of the regularized solution and its approximation to the minimizer of the fidelity function. 
    more » « less
  4. This paper considers an infinite-horizon Markov decision process (MDP) that allows for general nonexponential discount functions in both discrete and continuous time. Because of the inherent time inconsistency, we look for a randomized equilibrium policy (i.e., relaxed equilibrium) in an intrapersonal game between an agent’s current and future selves. When we modify the MDP by entropy regularization, a relaxed equilibrium is shown to exist by a nontrivial entropy estimate. As the degree of regularization diminishes, the entropy-regularized MDPs approximate the original MDP, which gives the general existence of a relaxed equilibrium in the limit by weak convergence arguments. As opposed to prior studies that consider only deterministic policies, our existence of an equilibrium does not require any convexity (or concavity) of the controlled transition probabilities and reward function. Interestingly, this benefit of considering randomized policies is unique to the time-inconsistent case. 
    more » « less
  5. null (Ed.)
    Ridge-like regularization often leads to improved generalization performance of machine learning models by mitigating overfitting. While ridge-regularized machine learning methods are widely used in many important applications, direct training via optimization could become challenging in huge data scenarios with millions of examples and features. We tackle such challenges by proposing a general approach that achieves ridge-like regularization through implicit techniques named Minipatch Ridge (MPRidge). Our approach is based on taking an ensemble of coefficients of unregularized learners trained on many tiny, random subsamples of both the examples and features of the training data, which we call minipatches. We empirically demonstrate that MPRidge induces an implicit ridge-like regularizing effect and performs nearly the same as explicit ridge regularization for a general class of predictors including logistic regression, SVM, and robust regression. Embarrassingly parallelizable, MPRidge provides a computationally appealing alternative to inducing ridge-like regularization for improving generalization performance in challenging big-data settings. 
    more » « less