skip to main content


Title: Learning Proximal Operators to Discover Multiple Optima
Finding multiple solutions of non-convex optimization problems is a ubiquitous yet challenging task. Most past algorithms either apply single-solution optimization methods from multiple random initial guesses or search in the vicinity of found solutions using ad hoc heuristics. We present an end-to-end method to learn the proximal operator of a family of training problems so that multiple local minima can be quickly obtained from initial guesses by iterating the learned operator, emulating the proximal-point algorithm that has fast convergence. The learned proximal operator can be further generalized to recover multiple optima for unseen problems at test time, enabling applications such as object detection. The key ingredient in our formulation is a proximal regularization term, which elevates the convexity of our training loss: by applying recent theoretical results, we show that for weakly-convex objectives with Lipschitz gradients, training of the proximal operator converges globally with a practical degree of over-parameterization. We further present an exhaustive benchmark for multi-solution optimization to demonstrate the effectiveness of our method.  more » « less
Award ID(s):
1955697
NSF-PAR ID:
10480327
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
International Conference on Learning Representations
Date Published:
Journal Name:
International Conference on Learning Representations
Format(s):
Medium: X
Location:
Kigali
Sponsoring Org:
National Science Foundation
More Like this
  1. Finding multiple solutions of non-convex optimization problems is a ubiquitous yet challenging task. Most past algorithms either apply single-solution optimization methods from multiple random initial guesses or search in the vicinity of found solutions using ad hoc heuristics. We present an end-to-end method to learn the proximal operator of a family of training problems so that multiple local minima can be quickly obtained from initial guesses by iterating the learned operator, emulating the proximal-point algorithm that has fast convergence. The learned proximal operator can be further generalized to recover multiple optima for unseen problems at test time, enabling applications such as object detection. The key ingredient in our formulation is a proximal regularization term, which elevates the convexity of our training loss: by applying recent theoretical results, we show that for weakly-convex objectives with Lipschitz gradients, training of the proximal operator converges globally with a practical degree of over-parameterization. We further present an exhaustive benchmark for multi-solution optimization to demonstrate the effectiveness of our method. 
    more » « less
  2. Abstract

    We present a new feasible proximal gradient method for constrained optimization where both the objective and constraint functions are given by summation of a smooth, possibly nonconvex function and a convex simple function. The algorithm converts the original problem into a sequence of convex subproblems. Formulating those subproblems requires the evaluation of at most one gradient-value of the original objective and constraint functions. Either exact or approximate subproblems solutions can be computed efficiently in many cases. An important feature of the algorithm is the constraint level parameter. By carefully increasing this level for each subproblem, we provide a simple solution to overcome the challenge of bounding the Lagrangian multipliers and show that the algorithm follows a strictly feasible solution path till convergence to the stationary point. We develop a simple, proximal gradient descent type analysis, showing that the complexity bound of this new algorithm is comparable to gradient descent for the unconstrained setting which is new in the literature. Exploiting this new design and analysis technique, we extend our algorithms to some more challenging constrained optimization problems where (1) the objective is a stochastic or finite-sum function, and (2) structured nonsmooth functions replace smooth components of both objective and constraint functions. Complexity results for these problems also seem to be new in the literature. Finally, our method can also be applied to convex function constrained problems where we show complexities similar to the proximal gradient method.

     
    more » « less
  3. We propose an inexact variable-metric proximal point algorithm to accelerate gradient-based optimization algorithms. The proposed scheme, called QNing can be notably applied to incremental first-order methods such as the stochastic variance-reduced gradient descent algorithm (SVRG) and other randomized incremental optimization algorithms. QNing is also compatible with composite objectives, meaning that it has the ability to provide exactly sparse solutions when the objective involves a sparsity-inducing regularization. When combined with limited-memory BFGS rules, QNing is particularly effective to solve high-dimensional optimization problems, while enjoying a worst-case linear convergence rate for strongly convex problems. We present experimental results where QNing gives significant improvements over competing methods for training machine learning methods on large samples and in high dimensions. 
    more » « less
  4. ABSTRACT

    We introduce a new class of iterative image reconstruction algorithms for radio interferometry, at the interface of convex optimization and deep learning, inspired by plug-and-play methods. The approach consists in learning a prior image model by training a deep neural network (DNN) as a denoiser, and substituting it for the handcrafted proximal regularization operator of an optimization algorithm. The proposed AIRI (‘AI for Regularization in radio-interferometric Imaging’) framework, for imaging complex intensity structure with diffuse and faint emission from visibility data, inherits the robustness and interpretability of optimization, and the learning power and speed of networks. Our approach relies on three steps. First, we design a low dynamic range training data base from optical intensity images. Secondly, we train a DNN denoiser at a noise level inferred from the signal-to-noise ratio of the data. We use training losses enhanced with a non-expansiveness term ensuring algorithm convergence, and including on-the-fly data base dynamic range enhancement via exponentiation. Thirdly, we plug the learned denoiser into the forward–backward optimization algorithm, resulting in a simple iterative structure alternating a denoising step with a gradient-descent data-fidelity step. We have validated AIRI against clean, optimization algorithms of the SARA family, and a DNN trained to reconstruct the image directly from visibility data. Simulation results show that AIRI is competitive in imaging quality with SARA and its unconstrained forward–backward-based version uSARA, while providing significant acceleration. clean remains faster but offers lower quality. The end-to-end DNN offers further acceleration, but with far lower quality than AIRI.

     
    more » « less
  5. Abstract

    We present a non-anticipative learning- and scenario-based prediction-optimization (ScenPredOpt) framework that combines deep learning, heuristics, and mathematical solvers for solving combinatorial problems under uncertainty. Specifically, we transform neural machine translation frameworks to predict the optimal solutions of scenario-based multi-stage stochastic programs. The learning models are trained efficiently using the input and solution data of the multi-stage single-scenario deterministic problems. Then our ScenPredOpt framework creates a mapping from the inputs used in training into an output of predictions that are close to optimal solutions. We present a Non-anticipative Encoder-Decoder with Attention (NEDA) approach, which ensures the non-anticipativity property of multi-stage stochastic programs and, thus, time consistency by calibrating the learned information based on the problem’s scenario tree and adjusting the hidden states of the neural network. In our ScenPredOpt framework, the percent predicted variables used for the solution are iteratively reduced through a relaxation of the problem to eliminate infeasibility. Then, a linear relaxation-based heuristic is performed to further reduce the solution time. Finally, a mathematical solver is used to generate the complete solution. We present the results on two NP-Hard sequential optimization problems under uncertainty: stochastic multi-item capacitated lot-sizing and stochastic multistage multidimensional knapsack. The results show that the solution time can be reduced by a factor of 599 with an optimality gap of only 0.08%. We compare the results of the ScenPredOpt framework with cutting-edge exact and heuristic solution algorithms for the problems studied and find that our framework is more effective. Additionally, the computational results demonstrate that ScenPredOpt can solve instances with a larger number of items and scenarios than the trained ones. Our non-anticipative learning-optimization approach can be beneficial for stochastic programming problems involving binary variables that are solved repeatedly with various types of dimensions and similar decisions at each period.

     
    more » « less