skip to main content


Title: Catalyst for Gradient-based Nonconvex Optimization
We introduce a generic scheme to solve nonconvex optimization problems using gradient-based algorithms originally designed for minimizing convex functions. Even though these methods may originally require convexity to operate, the proposed approach allows one to use them without assuming any knowledge about the convexity of the objective. In general, the scheme is guaranteed to produce a stationary point with a worst-case efficiency typical of first-order methods, and when the objective turns out to be convex, it automatically accelerates in the sense of Nesterov and achieves near-optimal convergence rate in function values. We conclude the paper by showing promising experimental results obtained by applying our approach to incremental algorithms such as SVRG and SAGA for sparse matrix factorization and for learning neural networks  more » « less
Award ID(s):
1740551 1651851
NSF-PAR ID:
10066901
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the Twenty-First International Conference on Artificial Intelligence and Statistics
Volume:
84
Page Range / eLocation ID:
613--622
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract In this paper we consider large-scale smooth optimization problems with multiple linear coupled constraints. Due to the non-separability of the constraints, arbitrary random sketching would not be guaranteed to work. Thus, we first investigate necessary and sufficient conditions for the sketch sampling to have well-defined algorithms. Based on these sampling conditions we develop new sketch descent methods for solving general smooth linearly constrained problems, in particular, random sketch descent (RSD) and accelerated random sketch descent (A-RSD) methods. To our knowledge, this is the first convergence analysis of RSD algorithms for optimization problems with multiple non-separable linear constraints. For the general case, when the objective function is smooth and non-convex, we prove for the non-accelerated variant sublinear rate in expectation for an appropriate optimality measure. In the smooth convex case, we derive for both algorithms, non-accelerated and A-RSD, sublinear convergence rates in the expected values of the objective function. Additionally, if the objective function satisfies a strong convexity type condition, both algorithms converge linearly in expectation. In special cases, where complexity bounds are known for some particular sketching algorithms, such as coordinate descent methods for optimization problems with a single linear coupled constraint, our theory recovers the best known bounds. Finally, we present several numerical examples to illustrate the performances of our new algorithms. 
    more » « less
  2. We introduce a generic scheme for accelerating gradient-based optimization methods in the sense of Nesterov. The approach, called Catalyst, builds upon the inexact accelerated proximal point algorithm for minimizing a convex objective function, and consists of approximately solving a sequence of well-chosen auxiliary problems, leading to faster convergence. One of the keys to achieve acceleration in theory and in practice is to solve these sub-problems with appropriate accuracy by using the right stopping criterion and the right warm-start strategy. We give practical guidelines to use Catalyst and present a comprehensive analysis of its global complexity. We show that Catalyst applies to a large class of algorithms, including gradient descent, block coordinate descent, incremental algorithms such as SAG, SAGA, SDCA, SVRG, MISO/Finito, and their proximal variants. For all of these methods, we establish faster rates using the Catalyst acceleration, for strongly convex and non-strongly convex objectives. We conclude with extensive experiments showing that acceleration is useful in practice, especially for ill-conditioned problems. 
    more » « less
  3. Motivated by practical concerns in applying information design to markets and service systems, we consider a persuasion problem between a sender and a receiver where the receiver may not be an expected utility maximizer. In particular, the receiver’s utility may be non-linear in her belief; we deem such receivers as risk-conscious. Such utility models arise, for example, when the receiver exhibits sensitivity to the variability and the risk in the payoff on choosing an action (e.g., waiting time for a service). In the presence of such non-linearity, the standard approach of using revelation-principle style arguments fails to characterize the set of signals needed in the optimal signaling scheme. Our main contribution is to provide a theoretical framework, using results from convex analysis, to overcome this technical challenge. In particular, in general persuasion settings with risk-conscious agents, we prove that the sender’s problem can be reduced to a convex optimization program. Furthermore, using this characterization, we obtain a bound on the number of signals needed in the optimal signaling scheme. We apply our methods to study a specific setting, namely binary per-suasion, where the receiver has two possible actions (0 and 1), and the sender always prefers the receiver taking action 1. Under a mild convexity assumption on the receiver’s utility and using a geometric approach,we show that the convex program can be further reduced to a linear program. Furthermore, this linear program yields a canonical construction of the set of signals needed in an optimal signaling mechanism. In particular, this canonical set of signals only involves signals that fully reveal the state and signals that induce uncertainty between two states.We illustrate our results in the setting of signaling wait time information in an unobservable queue with customers whose utilities depend on the variance of their waiting times. 
    more » « less
  4. Inverse problems of identifying parameters in partial differential equations (PDEs) is an important class of problems with many real-world applications. Inverse problems are commonly studied in optimization setting with various known approaches having their advantages and disadvantages. Although a non-convex output least-squares (OLS) objective has often been used, a convex modified output least-squares (MOLS) attracted quite an attention in recent years. However, the convexity of the MOLS has only been established for parameters appearing linearly in the PDEs. The primary objective of this work is to introduce and analyze a variant of the MOLS for the inverse problem of identifying parameters that appear nonlinearly in variational problems. Besides giving an existence result for the inverse problem, we derive the first-order and second-order derivative formulas for the new functional and use them to identify the conditions under which the new functional is convex. We give a discretization scheme for the continuous inverse problem and prove its convergence. We also obtain discrete formulas for the new MOLS functional and present detailed numerical examples. 
    more » « less
  5. We develop a projected Nesterov’s proximal-gradient (PNPG) approach for sparse signal reconstruction that combines adaptive step size with Nesterov’s momentum acceleration. The objective function that we wish to minimize is the sum of a convex differentiable data-fidelity (negative log-likelihood (NLL)) term and a convex regularization term. We apply sparse signal regularization where the signal belongs to a closed convex set within the closure of the domain of the NLL; the convex-set constraint facilitates flexible NLL domains and accurate signal recovery. Signal sparsity is imposed using the ℓ₁-norm penalty on the signal’s linear transform coefficients. The PNPG approach employs a projected Nesterov’s acceleration step with restart and a duality-based inner iteration to compute the proximal mapping. We propose an adaptive step-size selection scheme to obtain a good local majorizing function of the NLL and reduce the time spent backtracking. Thanks to step-size adaptation, PNPG converges faster than the methods that do not adjust to the local curvature of the NLL. We present an integrated derivation of the momentum acceleration and proofs of O(k⁻²) objective function convergence rate and convergence of the iterates, which account for adaptive step size, inexactness of the iterative proximal mapping, and the convex-set constraint. The tuning of PNPG is largely application independent. Tomographic and compressed-sensing reconstruction experiments with Poisson generalized linear and Gaussian linear measurement models demonstrate the performance of the proposed approach. 
    more » « less