skip to main content


Title: Self-Supervised Primal-Dual Learning for Constrained Optimization
This paper studies how to train machine-learning models that directly approximate the optimal solutions of constrained optimization problems. This is an empirical risk minimization under constraints, which is challenging as training must balance optimality and feasibility conditions. Supervised learning methods often approach this challenge by training the model on a large collection of pre-solved instances. This paper takes a different route and proposes the idea of Primal-Dual Learning (PDL), a self-supervised training method that does not require a set of pre-solved instances or an optimization solver for training and inference. Instead, PDL mimics the trajectory of an Augmented Lagrangian Method (ALM) and jointly trains primal and dual neural networks. Being a primal-dual method, PDL uses instance-specific penalties of the constraint terms in the loss function used to train the primal network. Experiments show that, on a set of nonlinear optimization benchmarks, PDL typically exhibits negligible constraint violations and minor optimality gaps, and is remarkably close to the ALM optimization. PDL also demonstrated improved or similar performance in terms of the optimality gaps, constraint violations, and training times compared to existing approaches.  more » « less
Award ID(s):
2007095
PAR ID:
10463205
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
Volume:
37
Issue:
4
ISSN:
2159-5399
Page Range / eLocation ID:
4052 to 4060
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Optimization problems are ubiquitous in our societies and are present in almost every segment of the economy. Most of these optimization problems are NP-hard and computationally demanding, often requiring approximate solutions for large-scale instances. Machine learning frameworks that learn to approximate solutions to such hard optimization problems are a potentially promising avenue to address these difficulties, particularly when many closely related problem instances must be solved repeatedly. Supervised learning frameworks can train a model using the outputs of pre-solved instances. However, when the outputs are themselves approximations, when the optimization problem has symmetric solutions, and/or when the solver uses randomization, solutions to closely related instances may exhibit large differences and the learning task can become inherently more difficult. This paper demonstrates this critical challenge, connects the volatility of the training data to the ability of a model to approximate it, and proposes a method for producing (exact or approximate) solutions to optimization problems that are more amenable to supervised learning tasks. The effectiveness of the method is tested on hard non-linear nonconvex and discrete combinatorial problems. 
    more » « less
  2. afe reinforcement learning (RL) aims to learn policies that satisfy certain constraints before deploying them to safety-critical applications. Previous primal-dual style approaches suffer from instability issues and lack optimality guarantees. This paper overcomes the issues from the perspective of probabilistic inference. We introduce a novel Expectation-Maximization approach to naturally incorporate constraints during the policy learning: 1) a provable optimal non-parametric variational distribution could be computed in closed form after a convex optimization (E-step); 2) the policy parameter is improved within the trust region based on the optimal variational distribution (M-step). The proposed algorithm decomposes the safe RL problem into a convex optimization phase and a supervised learning phase, which yields a more stable training performance. A wide range of experiments on continuous robotic tasks shows that the proposed method achieves significantly better constraint satisfaction performance and better sample efficiency than baselines. The code is available at https://github.com/liuzuxin/cvpo-safe-rl. 
    more » « less
  3. In this work, we consider a distributed online convex optimization problem, with time-varying (potentially adversarial) constraints. A set of nodes, jointly aim to minimize a global objective function, which is the sum of local convex functions. The objective and constraint functions are revealed locally to the nodes, at each time, after taking an action. Naturally, the constraints cannot be instantaneously satisfied. Therefore, we reformulate the problem to satisfy these constraints in the long term. To this end, we propose a distributed primal-dual mirror descent-based algorithm, in which the primal and dual updates are carried out locally at all the nodes. This is followed by sharing and mixing of the primal variables by the local nodes via communication with the immediate neighbors. To quantify the performance of the proposed algorithm, we utilize the challenging, but more realistic metrics of dynamic regret and fit. Dynamic regret measures the cumulative loss incurred by the algorithm compared to the best dynamic strategy, while fit measures the long term cumulative constraint violations. Without assuming the restrictive Slater’s conditions, we show that the proposed algorithm achieves sublinear regret and fit under mild, commonly used assumptions. 
    more » « less
  4. Abstract

    This paper concerns the formulation and analysis of a new interior-point method for constrained optimization that combines a shifted primal-dual interior-point method with a projected-search method for bound-constrained optimization. The method involves the computation of an approximate Newton direction for a primal-dual penalty-barrier function that incorporates shifts on both the primal and dual variables. Shifts on the dual variables allow the method to be safely “warm started” from a good approximate solution and avoids the possibility of very large solutions of the associated path-following equations. The approximate Newton direction is used in conjunction with a new projected-search line-search algorithm that employs a flexible non-monotone quasi-Armijo line search for the minimization of each penalty-barrier function. Numerical results are presented for a large set of constrained optimization problems. For comparison purposes, results are also given for two primal-dual interior-point methods that do not use projection. The first is a method that shifts both the primal and dual variables. The second is a method that involves shifts on the primal variables only. The results show that the use of both primal and dual shifts in conjunction with projection gives a method that is more robust and requires significantly fewer iterations. In particular, the number of times that the search direction must be computed is substantially reduced. Results from a set of quadratic programming test problems indicate that the method is particularly well-suited to solving the quadratic programming subproblem in a sequential quadratic programming method for nonlinear optimization.

     
    more » « less
  5. Abstract

    This article proposes a novel distributionally robust optimization (DRO)‐based soft‐constrained model predictive control (MPC) framework to explicitly hedge against unknown external input terms in a linear state‐space system. Without a priori knowledge of the exact uncertainty distribution, this framework works with a lifted ambiguity set constructed using machine learning to incorporate the first‐order moment information. By adopting a linear performance measure and considering input and state constraints robustly with respect to a lifted support set, the DRO‐based MPC is reformulated as a robust optimization problem. The constraints are softened to ensure recursive feasibility. Theoretical results on optimality, feasibility, and stability are further discussed. Performance and computational efficiency of the proposed method are illustrated through motion control and building energy control systems, showing 18.3% less cost and 78.8% less constraint violations, respectively, while requiring one third of the CPU time compared to multi‐stage scenario based stochastic MPC.

     
    more » « less