skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Tikhonov Regularization is Optimal Transport Robust under Martingale Constraints.
Distributionally robust optimization (DRO) has been shown to offer a principled way to regularize learning models. In this paper, we find that Tikhonov regularization is distributionally robust in an optimal transport sense (i.e. if an adversary chooses distributions in a suitable optimal transport neighborhood of the empirical measure), provided that suitable martingale constraints are also imposed. Further, we introduce a relaxation of the martingale constraints which not only provide a unified viewpoint to a class of existing robust methods but also lead to new regularization tools. To realize these novel tools, provably efficient computational algorithms are proposed. As a byproduct, the strong duality theorem proved in this paper can be potentially applied to other problems of independent interest.  more » « less
Award ID(s):
2118199
PAR ID:
10413183
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Advances in neural information processing systems
Volume:
35
Issue:
2022
ISSN:
1049-5258
Page Range / eLocation ID:
17677--17689
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We focus on robust estimation of the unobserved state of a discrete-time stochastic system with linear dynamics. A standard analysis of this estimation problem assumes a baseline innovation model; with Gaussian innovations we recover the Kalman filter. However, in many settings, there is insufficient or corrupted data to validate the baseline model. To cope with this problem, we minimize the worst-case mean-squared estimation error of adversarial models chosen within a Wasserstein neighborhood around the baseline. We also constrain the adversarial innovations to form a martingale difference sequence. The martingale constraint relaxes the i.i.d. assumptions which are often imposed on the baseline model. Moreover, we show that the martingale constraints guarantee that the adversarial dynamics remain adapted to the natural time-generated information. Therefore, adding the martingale constraint allows to improve upon over-conservative policies that also protect against unrealistic omniscient adversaries. We establish a strong duality result which we use to develop an efficient subgradient method to compute the distributionally robust estimation policy. If the baseline innovations are Gaussian, we show that the worst-case adversary remains Gaussian. Our numerical experiments indicate that the martingale constraint may also aid in adding a layer of robustness in the choice of the adversarial power. 
    more » « less
  2. We study the convergence of divergence-regularized optimal transport as the regularization parameter vanishes. Sharp rates for general divergences including relative entropy or Lpregularization, general transport costs, and multimarginal problems are obtained. A novel methodology using quantization and martingale couplings is suitable for noncompact marginals and achieves, in particular, the sharp leading-order term of entropically regularized 2-Wasserstein distance for marginals with a finite [Formula: see text]-moment. Funding: This work was supported by the Alfred P. Sloan Foundation [Grant FG-2016-6282] and the Division of Mathematical Sciences [Grants DMS-1812661 and DMS-2106056]. 
    more » « less
  3. For two measures that are in convex-decreasing order, Nutz and Stebegg (Canonical supermartingale couplings, Ann. Probab. 46(6) 3351–3398, 2018) studied the optimal transport problem with supermartingale constraints and introduced two canonical couplings, namely the increasing and decreasing transport plans, that are optimal for a large class of cost functions. In the present paper we provide an explicit construction of the decreasing coupling by establishing a Brenier-type result: (a generalised version of) this coupling concentrates on the graphs of two functions. Our construction is based on the concept of the supermartingale shadow measure and requires a suitable extension of the results by Juillet (Stability of the shadow projection and the left-curtain coupling, Ann. Inst. H. Poincaré Probab. Statist. 52(4) 1823–1843, November 2016) and Beiglböck and Juillet (Shadow couplings, Trans. Amer. Math. Soc. 374 4973–5002, 2021) established in the martingale setting. In particular, we prove the stability of the supermartingale shadow measure with respect to initial and target measures introduce an infinite family of lifted supermartingale couplings that arise via shadow measure, and show how to explicitly determine the martingale points of each such coupling. 
    more » « less
  4. Abstract We show that several machine learning estimators, including square-root least absolute shrinkage and selection and regularized logistic regression, can be represented as solutions to distributionally robust optimization problems. The associated uncertainty regions are based on suitably defined Wasserstein distances. Hence, our representations allow us to view regularization as a result of introducing an artificial adversary that perturbs the empirical distribution to account for out-of-sample effects in loss estimation. In addition, we introduce RWPI (robust Wasserstein profile inference), a novel inference methodology which extends the use of methods inspired by empirical likelihood to the setting of optimal transport costs (of which Wasserstein distances are a particular case). We use RWPI to show how to optimally select the size of uncertainty regions, and as a consequence we are able to choose regularization parameters for these machine learning estimators without the use of cross validation. Numerical experiments are also given to validate our theoretical findings. 
    more » « less
  5. Summary Estimators based on Wasserstein distributionally robust optimization are obtained as solutions of min-max problems in which the statistician selects a parameter minimizing the worst-case loss among all probability models within a certain distance from the underlying empirical measure in a Wasserstein sense. While motivated by the need to identify optimal model parameters or decision choices that are robust to model misspecification, these distributionally robust estimators recover a wide range of regularized estimators, including square-root lasso and support vector machines, among others. This paper studies the asymptotic normality of these distributionally robust estimators as well as the properties of an optimal confidence region induced by the Wasserstein distributionally robust optimization formulation. In addition, key properties of min-max distributionally robust optimization problems are also studied; for example, we show that distributionally robust estimators regularize the loss based on its derivative, and we also derive general sufficient conditions which show the equivalence between the min-max distributionally robust optimization problem and the corresponding max-min formulation. 
    more » « less