skip to main content


Title: Robust Wasserstein profile inference and applications to machine learning
Abstract We show that several machine learning estimators, including square-root least absolute shrinkage and selection and regularized logistic regression, can be represented as solutions to distributionally robust optimization problems. The associated uncertainty regions are based on suitably defined Wasserstein distances. Hence, our representations allow us to view regularization as a result of introducing an artificial adversary that perturbs the empirical distribution to account for out-of-sample effects in loss estimation. In addition, we introduce RWPI (robust Wasserstein profile inference), a novel inference methodology which extends the use of methods inspired by empirical likelihood to the setting of optimal transport costs (of which Wasserstein distances are a particular case). We use RWPI to show how to optimally select the size of uncertainty regions, and as a consequence we are able to choose regularization parameters for these machine learning estimators without the use of cross validation. Numerical experiments are also given to validate our theoretical findings.  more » « less
Award ID(s):
1820942
NSF-PAR ID:
10175458
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Journal of Applied Probability
Volume:
56
Issue:
03
ISSN:
0021-9002
Page Range / eLocation ID:
830 to 857
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Some recent works showed that several machine learning algorithms, such as square-root Lasso, Support Vector Machines, and regularized logistic regression, among many others, can be represented exactly as distributionally robust optimization (DRO) problems. The distributional uncertainty set is defined as a neighborhood centered at the empirical distribution, and the neighborhood is measured by optimal transport distance. In this paper, we propose a methodology which learns such neighborhood in a natural data-driven way. We show rigorously that our framework encompasses adaptive regularization as a particular case. Moreover, we demonstrate empirically that our proposed methodology is able to improve upon a wide range of popular machine learning estimators. 
    more » « less
  2. We revisit Markowitz’s mean-variance portfolio selection model by considering a distributionally robust version, in which the region of distributional uncertainty is around the empirical measure and the discrepancy between probability measures is dictated by the Wasserstein distance. We reduce this problem into an empirical variance minimization problem with an additional regularization term. Moreover, we extend the recently developed inference methodology to our setting in order to select the size of the distributional uncertainty as well as the associated robust target return rate in a data-driven way. Finally, we report extensive back-testing results on S&P 500 that compare the performance of our model with those of several well-known models including the Fama–French and Black–Litterman models. 
    more » « less
  3. We revisit Markowitz’s mean-variance portfolio selection model by considering a distributionally robust version, in which the region of distributional uncertainty is around the empirical measure and the discrepancy between probability measures is dictated by the Wasserstein distance. We reduce this problem into an empirical variance minimization problem with an additional regularization term. Moreover, we extend the recently developed inference methodology to our setting in order to select the size of the distributional uncertainty as well as the associated robust target return rate in a data-driven way. Finally, we report extensive back-testing results on S&P 500 that compare the performance of our model with those of several well-known models including the Fama–French and Black–Litterman models. This paper was accepted by David Simchi-Levi, finance. 
    more » « less
  4. This paper proposes a novel non-parametric multidimensional convex regression estimator which is designed to be robust to adversarial perturbations in the empirical measure. We minimize over convex functions the maximum (over Wasserstein perturbations of the empirical measure) of the absolute regression errors. The inner maximization is solved in closed form resulting in a regularization penalty involves the norm of the gradient. We show consistency of our estimator and a rate of convergence of order O˜(n−1/d), matching the bounds of alternative estimators based on square-loss minimization. Contrary to all of the existing results, our convergence rates hold without imposing compactness on the underlying domain and with no a priori bounds on the underlying convex function or its gradient norm. 
    more » « less
  5. This paper builds a bridge between two areas in optimization and machine learning by establishing a general connection between Wasserstein distributional robustness and variation regularization. It helps to demystify the empirical success of Wasserstein distributionally robust optimization and devise new regularization schemes for machine learning. 
    more » « less