Convergence analysis of accelerated first-order methods for convex optimization problems are developed from the point of view of ordinary differential equation solvers. A new dynamical system, called Nesterov accelerated gradient (NAG) flow, is derived from the connection between acceleration mechanism and
- Award ID(s):
- 2007757
- NSF-PAR ID:
- 10294963
- Date Published:
- Journal Name:
- Proceedings of the 38th International Conference on Machine Learning
- Volume:
- 139
- Page Range / eLocation ID:
- 1283-1293
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract A -stability of ODE solvers, and the exponential decay of a tailored Lyapunov function along with the solution trajectory is proved. Numerical discretizations of NAG flow are then considered and convergence rates are established via a discrete Lyapunov function. The proposed differential equation solver approach can not only cover existing accelerated methods, such as FISTA, Güler’s proximal algorithm and Nesterov’s accelerated gradient method, but also produce new algorithms for composite convex optimization that possess accelerated convergence rates. Both the convex and the strongly convex cases are handled in a unified way in our approach. -
We study performance of accelerated first-order optimization algorithms in the presence of additive white stochastic disturbances. For strongly convex quadratic problems, we explicitly evaluate the steady-state variance of the optimization variable in terms of the eigenvalues of the Hessian of the objective function. We demonstrate that, as the condition number increases, variance amplification of both Nesterov's accelerated method and the heavy-ball method by Polyak is significantly larger than that of the standard gradient descent. In the context of distributed computation over networks, we examine the role of network topology and spatial dimension on the performance of these first-order algorithms. For d-dimensional tori, we establish explicit asymptotic dependence for the variance amplification on the network size and the corresponding condition number. Our results demonstrate detrimental influence of acceleration on amplification of stochastic disturbances and suggest that metrics other than convergence rate have to be considered when evaluating performance of optimization algorithms.more » « less
-
We introduce a generic scheme for accelerating gradient-based optimization methods in the sense of Nesterov. The approach, called Catalyst, builds upon the inexact accelerated proximal point algorithm for minimizing a convex objective function, and consists of approximately solving a sequence of well-chosen auxiliary problems, leading to faster convergence. One of the keys to achieve acceleration in theory and in practice is to solve these sub-problems with appropriate accuracy by using the right stopping criterion and the right warm-start strategy. We give practical guidelines to use Catalyst and present a comprehensive analysis of its global complexity. We show that Catalyst applies to a large class of algorithms, including gradient descent, block coordinate descent, incremental algorithms such as SAG, SAGA, SDCA, SVRG, MISO/Finito, and their proximal variants. For all of these methods, we establish faster rates using the Catalyst acceleration, for strongly convex and non-strongly convex objectives. We conclude with extensive experiments showing that acceleration is useful in practice, especially for ill-conditioned problems.more » « less
-
One-particle Green’s functions obtained from the self-consistent solution of the Dyson equation can be employed in the evaluation of spectroscopic and thermodynamic properties for both molecules and solids. However, typical acceleration techniques used in the traditional quantum chemistry self-consistent algorithms cannot be easily deployed for the Green’s function methods because of a non-convex grand potential functional and a non-idempotent density matrix. Moreover, the optimization problem can become more challenging due to the inclusion of correlation effects, changing chemical potential, and fluctuations of the number of particles. In this paper, we study acceleration techniques to target the self-consistent solution of the Dyson equation directly. We use the direct inversion in the iterative subspace (DIIS), the least-squared commutator in the iterative subspace (LCIIS), and the Krylov space accelerated inexact Newton method (KAIN). We observe that the definition of the residual has a significant impact on the convergence of the iterative procedure. Based on the Dyson equation, we generalize the concept of the commutator residual used in DIIS and LCIIS and compare it with the difference residual used in DIIS and KAIN. The commutator residuals outperform the difference residuals for all considered molecular and solid systems within both GW and GF2. For a number of bond-breaking problems, we found that an easily obtained high-temperature solution with effectively suppressed correlations is a very effective starting point for reaching convergence of the problematic low-temperature solutions through a sequential reduction of temperature during calculations.
-
This work concerns the analysis and design of distributed first-order optimization algorithms over time-varying graphs. The goal of such algorithms is to optimize a global function that is the average of local functions using only local computations and communications. Several different algorithms have been proposed that achieve linear convergence to the global optimum when the local functions are strongly convex. We provide a unified analysis that yields the worst-case linear convergence rate as a function of the condition number of the local functions, the spectral gap of the graph, and the parameters of the algorithm. The framework requires solving a small semidefinite program whose size is fixed; it does not depend on the number of local functions or the dimension of their domain. The result is a computationally efficient method for distributed algorithm analysis that enables the rapid comparison, selection, and tuning of algorithms. Finally, we propose a new algorithm, which we call SVL, that is easily implementable and achieves a faster worst-case convergence rate than all other known algorithms.more » « less