skip to main content


Title: Learning Proximal Operators to Discover Multiple Optima
Finding multiple solutions of non-convex optimization problems is a ubiquitous yet challenging task. Most past algorithms either apply single-solution optimization methods from multiple random initial guesses or search in the vicinity of found solutions using ad hoc heuristics. We present an end-to-end method to learn the proximal operator of a family of training problems so that multiple local minima can be quickly obtained from initial guesses by iterating the learned operator, emulating the proximal-point algorithm that has fast convergence. The learned proximal operator can be further generalized to recover multiple optima for unseen problems at test time, enabling applications such as object detection. The key ingredient in our formulation is a proximal regularization term, which elevates the convexity of our training loss: by applying recent theoretical results, we show that for weakly-convex objectives with Lipschitz gradients, training of the proximal operator converges globally with a practical degree of over-parameterization. We further present an exhaustive benchmark for multi-solution optimization to demonstrate the effectiveness of our method.  more » « less
Award ID(s):
1955697
NSF-PAR ID:
10480327
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
International Conference on Learning Representations
Date Published:
Journal Name:
International Conference on Learning Representations
Format(s):
Medium: X
Location:
Kigali
Sponsoring Org:
National Science Foundation
More Like this
  1. Finding multiple solutions of non-convex optimization problems is a ubiquitous yet challenging task. Most past algorithms either apply single-solution optimization methods from multiple random initial guesses or search in the vicinity of found solutions using ad hoc heuristics. We present an end-to-end method to learn the proximal operator of a family of training problems so that multiple local minima can be quickly obtained from initial guesses by iterating the learned operator, emulating the proximal-point algorithm that has fast convergence. The learned proximal operator can be further generalized to recover multiple optima for unseen problems at test time, enabling applications such as object detection. The key ingredient in our formulation is a proximal regularization term, which elevates the convexity of our training loss: by applying recent theoretical results, we show that for weakly-convex objectives with Lipschitz gradients, training of the proximal operator converges globally with a practical degree of over-parameterization. We further present an exhaustive benchmark for multi-solution optimization to demonstrate the effectiveness of our method. 
    more » « less
  2. ABSTRACT

    We introduce a new class of iterative image reconstruction algorithms for radio interferometry, at the interface of convex optimization and deep learning, inspired by plug-and-play methods. The approach consists in learning a prior image model by training a deep neural network (DNN) as a denoiser, and substituting it for the handcrafted proximal regularization operator of an optimization algorithm. The proposed AIRI (‘AI for Regularization in radio-interferometric Imaging’) framework, for imaging complex intensity structure with diffuse and faint emission from visibility data, inherits the robustness and interpretability of optimization, and the learning power and speed of networks. Our approach relies on three steps. First, we design a low dynamic range training data base from optical intensity images. Secondly, we train a DNN denoiser at a noise level inferred from the signal-to-noise ratio of the data. We use training losses enhanced with a non-expansiveness term ensuring algorithm convergence, and including on-the-fly data base dynamic range enhancement via exponentiation. Thirdly, we plug the learned denoiser into the forward–backward optimization algorithm, resulting in a simple iterative structure alternating a denoising step with a gradient-descent data-fidelity step. We have validated AIRI against clean, optimization algorithms of the SARA family, and a DNN trained to reconstruct the image directly from visibility data. Simulation results show that AIRI is competitive in imaging quality with SARA and its unconstrained forward–backward-based version uSARA, while providing significant acceleration. clean remains faster but offers lower quality. The end-to-end DNN offers further acceleration, but with far lower quality than AIRI.

     
    more » « less
  3. Abstract

    We present a new feasible proximal gradient method for constrained optimization where both the objective and constraint functions are given by summation of a smooth, possibly nonconvex function and a convex simple function. The algorithm converts the original problem into a sequence of convex subproblems. Formulating those subproblems requires the evaluation of at most one gradient-value of the original objective and constraint functions. Either exact or approximate subproblems solutions can be computed efficiently in many cases. An important feature of the algorithm is the constraint level parameter. By carefully increasing this level for each subproblem, we provide a simple solution to overcome the challenge of bounding the Lagrangian multipliers and show that the algorithm follows a strictly feasible solution path till convergence to the stationary point. We develop a simple, proximal gradient descent type analysis, showing that the complexity bound of this new algorithm is comparable to gradient descent for the unconstrained setting which is new in the literature. Exploiting this new design and analysis technique, we extend our algorithms to some more challenging constrained optimization problems where (1) the objective is a stochastic or finite-sum function, and (2) structured nonsmooth functions replace smooth components of both objective and constraint functions. Complexity results for these problems also seem to be new in the literature. Finally, our method can also be applied to convex function constrained problems where we show complexities similar to the proximal gradient method.

     
    more » « less
  4. Abstract

    Sparsity finds applications in diverse areas such as statistics, machine learning, and signal processing. Computations over sparse structures are less complex compared to their dense counterparts and need less storage. This paper proposes a heuristic method for retrieving sparse approximate solutions of optimization problems via minimizing the$$\ell _{p}$$pquasi-norm, where$$00<p<1. An iterative two-block algorithm for minimizing the$$\ell _{p}$$pquasi-norm subject to convex constraints is proposed. The proposed algorithm requires solving for the roots of a scalar degree polynomial as opposed to applying a soft thresholding operator in the case of$$\ell _{1}$$1norm minimization. The algorithm’s merit relies on its ability to solve the$$\ell _{p}$$pquasi-norm minimization subject to any convex constraints set. For the specific case of constraints defined by differentiable functions with Lipschitz continuous gradient, a second, faster algorithm is proposed. Using a proximal gradient step, we mitigate the convex projection step and hence enhance the algorithm’s speed while proving its convergence. We present various applications where the proposed algorithm excels, namely, sparse signal reconstruction, system identification, and matrix completion. The results demonstrate the significant gains obtained by the proposed algorithm compared to other$$\ell _{p}$$pquasi-norm based methods presented in previous literature.

     
    more » « less
  5. Tasks across diverse application domains can be posed as large-scale optimization problems, these include graphics, vision, machine learning, imaging, health, scheduling, planning, and energy system forecasting. Independently of the application domain, proximal algorithms have emerged as a formal optimization method that successfully solves a wide array of existing problems, often exploiting problem-specific structures in the optimization. Although model-based formal optimization provides a principled approach to problem modeling with convergence guarantees, at first glance, this seems to be at odds with black-box deep learning methods. A recent line of work shows that, when combined with learning-based ingredients, model-based optimization methods are effective, interpretable, and allow for generalization to a wide spectrum of applications with little or no extra training data. However, experimenting with such hybrid approaches for different tasks by hand requires domain expertise in both proximal optimization and deep learning, which is often error-prone and time-consuming. Moreover, naively unrolling these iterative methods produces lengthy compute graphs, which when differentiated via autograd techniques results in exploding memory consumption, making batch-based training challenging. In this work, we introduce ∇-Prox, a domain-specific modeling language and compiler for large-scale optimization problems using differentiable proximal algorithms. ∇-Prox allows users to specify optimization objective functions of unknowns concisely at a high level, and intelligently compiles the problem into compute and memory-efficient differentiable solvers. One of the core features of ∇-Prox is its full differentiability, which supports hybrid model- and learning-based solvers integrating proximal optimization with neural network pipelines. Example applications of this methodology include learning-based priors and/or sample-dependent inner-loop optimization schedulers, learned with deep equilibrium learning or deep reinforcement learning. With a few lines of code, we show ∇-Prox can generate performant solvers for a range of image optimization problems, including end-to-end computational optics, image deraining, and compressive magnetic resonance imaging. We also demonstrate ∇-Prox can be used in a completely orthogonal application domain of energy system planning, an essential task in the energy crisis and the clean energy transition, where it outperforms state-of-the-art CVXPY and commercial Gurobi solvers. 
    more » « less