skip to main content


Title: Catalyst for Gradient-based Nonconvex Optimization
We introduce a generic scheme to solve nonconvex optimization problems using gradient-based algorithms originally designed for minimizing convex functions. Even though these methods may originally require convexity to operate, the proposed approach allows one to use them without assuming any knowledge about the convexity of the objective. In general, the scheme is guaranteed to produce a stationary point with a worst-case efficiency typical of first-order methods, and when the objective turns out to be convex, it automatically accelerates in the sense of Nesterov and achieves near-optimal convergence rate in function values. We conclude the paper by showing promising experimental results obtained by applying our approach to incremental algorithms such as SVRG and SAGA for sparse matrix factorization and for learning neural networks  more » « less
Award ID(s):
1740551 1651851
PAR ID:
10066901
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics
Volume:
84
Page Range / eLocation ID:
613--622
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract In this paper we consider large-scale smooth optimization problems with multiple linear coupled constraints. Due to the non-separability of the constraints, arbitrary random sketching would not be guaranteed to work. Thus, we first investigate necessary and sufficient conditions for the sketch sampling to have well-defined algorithms. Based on these sampling conditions we develop new sketch descent methods for solving general smooth linearly constrained problems, in particular, random sketch descent (RSD) and accelerated random sketch descent (A-RSD) methods. To our knowledge, this is the first convergence analysis of RSD algorithms for optimization problems with multiple non-separable linear constraints. For the general case, when the objective function is smooth and non-convex, we prove for the non-accelerated variant sublinear rate in expectation for an appropriate optimality measure. In the smooth convex case, we derive for both algorithms, non-accelerated and A-RSD, sublinear convergence rates in the expected values of the objective function. Additionally, if the objective function satisfies a strong convexity type condition, both algorithms converge linearly in expectation. In special cases, where complexity bounds are known for some particular sketching algorithms, such as coordinate descent methods for optimization problems with a single linear coupled constraint, our theory recovers the best known bounds. Finally, we present several numerical examples to illustrate the performances of our new algorithms. 
    more » « less
  2. We introduce a generic scheme for accelerating gradient-based optimization methods in the sense of Nesterov. The approach, called Catalyst, builds upon the inexact accelerated proximal point algorithm for minimizing a convex objective function, and consists of approximately solving a sequence of well-chosen auxiliary problems, leading to faster convergence. One of the keys to achieve acceleration in theory and in practice is to solve these sub-problems with appropriate accuracy by using the right stopping criterion and the right warm-start strategy. We give practical guidelines to use Catalyst and present a comprehensive analysis of its global complexity. We show that Catalyst applies to a large class of algorithms, including gradient descent, block coordinate descent, incremental algorithms such as SAG, SAGA, SDCA, SVRG, MISO/Finito, and their proximal variants. For all of these methods, we establish faster rates using the Catalyst acceleration, for strongly convex and non-strongly convex objectives. We conclude with extensive experiments showing that acceleration is useful in practice, especially for ill-conditioned problems. 
    more » « less
  3. Abstract We revisit the problem of approximating minimizers of certain convex functionals subject to a convexity constraint by solutions of fourth order equations of Abreu type. This approximation problem was studied in previous articles of Carlier–Radice (Approximation of variational problems with a convexity constraint by PDEs of Abreu type. Calc. Var. Partial Differential Equations 58 (2019), no. 5, Art. 170) and the author (Singular Abreu equations and minimizers of convex functionals with a convexity constraint, arXiv:1811.02355v3, Comm. Pure Appl. Math. , to appear), under the uniform convexity of both the Lagrangian and constraint barrier. By introducing a new approximating scheme, we completely remove the uniform convexity of both the Lagrangian and constraint barrier. Our analysis is applicable to variational problems motivated by the original 2D Rochet–Choné model in the monopolist's problem in Economics, and variational problems arising in the analysis of wrinkling patterns in floating elastic shells in Elasticity. 
    more » « less
  4. Motivated by approximation Bayesian computation using mean-field variational approximation and the computation of equilibrium in multi-species systems with cross-interaction, this paper investigates the composite geodesically convex optimization problem over multiple distributions. The objective functional under consideration is composed of a convex potential energy on a product of Wasserstein spaces and a sum of convex self-interaction and internal energies associated with each distribution. To efficiently solve this problem, we introduce the Wasserstein Proximal Coordinate Gradient (WPCG) algorithms with parallel, sequential, and random update schemes. Under a quadratic growth (QG) condition that is weaker than the usual strong convexity requirement on the objective functional, we show that WPCG converges exponentially fast to the unique global optimum. In the absence of the QG condition, WPCG is still demonstrated to converge to the global optimal solution, albeit at a slower polynomial rate. Numerical results for both motivating examples are consistent with our theoretical findings. 
    more » « less
  5. Inverse problems of identifying parameters in partial differential equations (PDEs) is an important class of problems with many real-world applications. Inverse problems are commonly studied in optimization setting with various known approaches having their advantages and disadvantages. Although a non-convex output least-squares (OLS) objective has often been used, a convex modified output least-squares (MOLS) attracted quite an attention in recent years. However, the convexity of the MOLS has only been established for parameters appearing linearly in the PDEs. The primary objective of this work is to introduce and analyze a variant of the MOLS for the inverse problem of identifying parameters that appear nonlinearly in variational problems. Besides giving an existence result for the inverse problem, we derive the first-order and second-order derivative formulas for the new functional and use them to identify the conditions under which the new functional is convex. We give a discretization scheme for the continuous inverse problem and prove its convergence. We also obtain discrete formulas for the new MOLS functional and present detailed numerical examples. 
    more » « less