Abstract Simple stochastic momentum methods are widely used in machine learning optimization, but their good practical performance is at odds with an absence of theoretical guarantees of acceleration in the literature. In this work, we aim to close the gap between theory and practice by showing that stochastic heavy ball momentum retains the fast linear rate of (deterministic) heavy ball momentum on quadratic optimization problems, at least when minibatching with a sufficiently large batch size. The algorithm we study can be interpreted as an accelerated randomized Kaczmarz algorithm with minibatching and heavy ball momentum. The analysis relies on carefully decomposing the momentum transition matrix, and using new spectral norm concentration bounds for products of independent random matrices. We provide numerical illustrations demonstrating that our bounds are reasonably sharp.
more »
« less
Conformal symplectic and relativistic optimization
Abstract Arguably, the two most popular accelerated or momentum-based optimization methods in machine learning are Nesterov’s accelerated gradient and Polyaks’s heavy ball, both corresponding to different discretizations of a particular second order differential equation with friction. Such connections with continuous-time dynamical systems have been instrumental in demystifying acceleration phenomena in optimization. Here we study structure-preserving discretizations for a certain class of dissipative (conformal) Hamiltonian systems, allowing us to analyse the symplectic structure of both Nesterov and heavy ball, besides providing several new insights into these methods. Moreover, we propose a new algorithm based on a dissipative relativistic system that normalizes the momentum and may result in more stable/faster optimization. Importantly, such a method generalizes both Nesterov and heavy ball, each being recovered as distinct limiting cases, and has potential advantages at no additional cost.
more »
« less
- Award ID(s):
- 2031985
- PAR ID:
- 10428817
- Date Published:
- Journal Name:
- Journal of Statistical Mechanics: Theory and Experiment
- Volume:
- 2020
- Issue:
- 12
- ISSN:
- 1742-5468
- Page Range / eLocation ID:
- 124008
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The heavy-ball momentum method accelerates gradient descent with a momentum term but lacks accelerated convergence for general smooth strongly convex problems. This work introduces the Accelerated Over-Relaxation Heavy-Ball (AOR-HB) method, the first variant with provable global and accelerated convergence for such problems. AOR-HB closes a long-standing theoretical gap, extends to composite convex optimization and min-max problems, and achieves optimal complexity bounds. It offers three key advantages: (1) broad generalization ability, (2) potential to reshape acceleration techniques, and (3) conceptual clarity and elegance compared to existing methods.more » « less
-
Convergence analysis of accelerated first-order methods for convex optimization problems are developed from the point of view of ordinary differential equation solvers. A new dynamical system, called Nesterov accelerated gradient (NAG) flow, is derived from the connection between acceleration mechanism and A-stability of ODE solvers, and the exponential decay of a tailored Lyapunov function along with the solution trajectory is proved. Numerical discretizations of NAG flow are then considered and convergence rates are established via a discrete Lyapunov function. The proposed differential equation solver approach can not only cover existing accelerated methods, such as FISTA, Güler’s proximal algorithm and Nesterov’s accelerated gradient method, but also produce new algorithms for composite convex optimization that possess accelerated convergence rates. Both the convex and the strongly convex cases are handled in a unified way in our approach.more » « less
-
Convergence analysis of accelerated first-order methods for convex optimization prob- lems are developed from the point of view of ordinary differential equation solvers. A new dynamical system, called Nesterov accelerated gradient (NAG) flow, is derived from the connection between acceleration mechanism and A-stability of ODE solvers, and the exponential decay of a tailored Lyapunov function along with the solution tra- jectory is proved. Numerical discretizations of NAG flow are then considered and convergence rates are established via a discrete Lyapunov function. The proposed differential equation solver approach can not only cover existing accelerated meth- ods, such as FISTA, Güler’s proximal algorithm and Nesterov’s accelerated gradient method, but also produce new algorithms for composite convex optimization that pos- sess accelerated convergence rates. Both the convex and the strongly convex cases are handled in a unified way in our approach.more » « less
-
Dasgupta, Sanjoy; Mandt, Stephan; Li, Yingzhen (Ed.)We study accelerated optimization methods in the Gaussian phase retrieval problem. In this setting, we prove that gradient methods with Polyak or Nesterov momentum have similar implicit regularization to gradient descent. This implicit regularization ensures that the algorithms remain in a nice region, where the cost function is strongly convex and smooth despite being nonconvex in general. This ensures that these accelerated methods achieve faster rates of convergence than gradient descent. Experimental evidence demonstrates that the accelerated methods converge faster than gradient descent in practice.more » « less
An official website of the United States government

