Abstract We show that several machine learning estimators, including square-root least absolute shrinkage and selection and regularized logistic regression, can be represented as solutions to distributionally robust optimization problems. The associated uncertainty regions are based on suitably defined Wasserstein distances. Hence, our representations allow us to view regularization as a result of introducing an artificial adversary that perturbs the empirical distribution to account for out-of-sample effects in loss estimation. In addition, we introduce RWPI (robust Wasserstein profile inference), a novel inference methodology which extends the use of methods inspired by empirical likelihood to the setting of optimal transport costs (of which Wasserstein distances are a particular case). We use RWPI to show how to optimally select the size of uncertainty regions, and as a consequence we are able to choose regularization parameters for these machine learning estimators without the use of cross validation. Numerical experiments are also given to validate our theoretical findings.
more »
« less
Tikhonov Regularization is Optimal Transport Robust under Martingale Constraints.
Distributionally robust optimization (DRO) has been shown to offer a principled way to regularize learning models. In this paper, we find that Tikhonov regularization is distributionally robust in an optimal transport sense (i.e. if an adversary chooses distributions in a suitable optimal transport neighborhood of the empirical measure), provided that suitable martingale constraints are also imposed. Further, we introduce a relaxation of the martingale constraints which not only provide a unified viewpoint to a class of existing robust methods but also lead to new regularization tools. To realize these novel tools, provably efficient computational algorithms are proposed. As a byproduct, the strong duality theorem proved in this paper can be potentially applied to other problems of independent interest.
more »
« less
- Award ID(s):
- 2118199
- NSF-PAR ID:
- 10413183
- Date Published:
- Journal Name:
- Advances in neural information processing systems
- Volume:
- 35
- Issue:
- 2022
- ISSN:
- 1049-5258
- Page Range / eLocation ID:
- 17677--17689
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Some recent works showed that several machine learning algorithms, such as square-root Lasso, Support Vector Machines, and regularized logistic regression, among many others, can be represented exactly as distributionally robust optimization (DRO) problems. The distributional uncertainty set is defined as a neighborhood centered at the empirical distribution, and the neighborhood is measured by optimal transport distance. In this paper, we propose a methodology which learns such neighborhood in a natural data-driven way. We show rigorously that our framework encompasses adaptive regularization as a particular case. Moreover, we demonstrate empirically that our proposed methodology is able to improve upon a wide range of popular machine learning estimators.more » « less
-
We consider the problem of learning the underlying structure of a general discrete pairwise Markov network. Existing approaches that rely on empirical risk minimization may perform poorly in settings with noisy or scarce data. To overcome these limitations, we propose a computationally efficient and robust learning method for this problem with near-optimal sample complexities. Our approach builds upon distributionally robust optimization (DRO) and maximum conditional log-likelihood. The proposed DRO estimator minimizes the worst-case risk over an ambiguity set of adversarial distributions within bounded transport cost or f-divergence of the empirical data distribution. We show that the primal minimax learning problem can be efficiently solved by leveraging sufficient statistics and greedy maximization in the ostensibly intractable dual formulation. Based on DRO’s approximation to Lipschitz and variance regularization, we derive near-optimal sample complexities matching existing results. Extensive empirical evidence with different corruption models corroborates the effectiveness of the proposed methods.more » « less
-
null (Ed.)For energy-efficient Connected and Automated Vehicle (CAV) Eco-driving control on signalized arterials under uncertain traffic conditions, this paper explicitly considers traffic control devices (e.g., road markings, traffic signs, and traffic signals) and road geometry (e.g., road shapes, road boundaries, and road grades) constraints in a data-driven optimization-based Model Predictive Control (MPC) modeling framework. This modeling framework uses real-time vehicle driving and traffic signal data via Vehicle-to-Infrastructure (V2I) and Vehicle-to-Vehicle (V2V) communications. In the MPC-based control model, this paper mathematically formulates location-based traffic control devices and road geometry constraints using the geographic information from High-Definition (HD) maps. The location-based traffic control devices and road geometry constraints have the potential to improve the safety, energy, efficiency, driving comfort, and robustness of connected and automated driving on real roads by considering interrupted flow facility locations and road geometry in the formulation. We predict a set of uncertain driving states for the preceding vehicles through an online learning-based driving dynamics prediction model. We then solve a constrained finite-horizon optimal control problem with the predicted driving states to obtain a set of Eco-driving references for the controlled vehicle. To obtain the optimal acceleration or deceleration commands for the controlled vehicle with the set of Eco-driving references, we formulate a Distributionally Robust Stochastic Optimization (DRSO) model (i.e., a special case of data-driven optimization models under moment bounds) with Distributionally Robust Chance Constraints (DRCC) with location-based traffic control devices and road geometry constraints. We design experiments to demonstrate the proposed model under different traffic conditions using real-world connected vehicle trajectory data and Signal Phasing and Timing (SPaT) data on a coordinated arterial with six actuated intersections on Fuller Road in Ann Arbor, Michigan from the Safety Pilot Model Deployment (SPMD) project.more » « less
-
In a chance constrained program (CCP), decision makers seek the best decision whose probability of violating the uncertainty constraints is within the prespecified risk level. As a CCP is often nonconvex and is difficult to solve to optimality, much effort has been devoted to developing convex inner approximations for a CCP, among which the conditional value-at-risk (CVaR) has been known to be the best for more than a decade. This paper studies and generalizes the ALSO-X, originally proposed by Ahmed, Luedtke, SOng, and Xie in 2017 , for solving a CCP. We first show that the ALSO-X resembles a bilevel optimization, where the upper-level problem is to find the best objective function value and enforce the feasibility of a CCP for a given decision from the lower-level problem, and the lower-level problem is to minimize the expectation of constraint violations subject to the upper bound of the objective function value provided by the upper-level problem. This interpretation motivates us to prove that when uncertain constraints are convex in the decision variables, ALSO-X always outperforms the CVaR approximation. We further show (i) sufficient conditions under which ALSO-X can recover an optimal solution to a CCP; (ii) an equivalent bilinear programming formulation of a CCP, inspiring us to enhance ALSO-X with a convergent alternating minimization method (ALSO-X+); and (iii) an extension of ALSO-X and ALSO-X+ to distributionally robust chance constrained programs (DRCCPs) under the ∞−Wasserstein ambiguity set. Our numerical study demonstrates the effectiveness of the proposed methods.more » « less