skip to main content


Title: Parameter-free Locally Accelerated Conditional Gradients
Projection-free conditional gradient (CG) methods are the algorithms of choice for constrained optimization setups in which projections are often computationally prohibitive but linear optimization over the constraint set remains computationally feasible. Unlike in projection-based methods, globally accelerated convergence rates are in general unattainable for CG. However, a very recent work on Locally accelerated CG (LaCG) has demonstrated that local acceleration for CG is possible for many settings of interest. The main downside of LaCG is that it requires knowledge of the smoothness and strong convexity parameters of the objective function. We remove this limitation by introducing a novel, Parameter-Free Locally accelerated CG (PF-LaCG) algorithm, for which we provide rigorous convergence guarantees. Our theoretical results are complemented by numerical experiments, which demonstrate local acceleration and showcase the practical improvements of PF-LaCG over non-accelerated algorithms, both in terms of iteration count and wall-clock time.  more » « less
Award ID(s):
2007757
NSF-PAR ID:
10294963
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 38th International Conference on Machine Learning
Volume:
139
Page Range / eLocation ID:
1283-1293
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Convergence analysis of accelerated first-order methods for convex optimization problems are developed from the point of view of ordinary differential equation solvers. A new dynamical system, called Nesterov accelerated gradient (NAG) flow, is derived from the connection between acceleration mechanism andA-stability of ODE solvers, and the exponential decay of a tailored Lyapunov function along with the solution trajectory is proved. Numerical discretizations of NAG flow are then considered and convergence rates are established via a discrete Lyapunov function. The proposed differential equation solver approach can not only cover existing accelerated methods, such as FISTA, Güler’s proximal algorithm and Nesterov’s accelerated gradient method, but also produce new algorithms for composite convex optimization that possess accelerated convergence rates. Both the convex and the strongly convex cases are handled in a unified way in our approach.

     
    more » « less
  2. We study performance of accelerated first-order optimization algorithms in the presence of additive white stochastic disturbances. For strongly convex quadratic problems, we explicitly evaluate the steady-state variance of the optimization variable in terms of the eigenvalues of the Hessian of the objective function. We demonstrate that, as the condition number increases, variance amplification of both Nesterov's accelerated method and the heavy-ball method by Polyak is significantly larger than that of the standard gradient descent. In the context of distributed computation over networks, we examine the role of network topology and spatial dimension on the performance of these first-order algorithms. For d-dimensional tori, we establish explicit asymptotic dependence for the variance amplification on the network size and the corresponding condition number. Our results demonstrate detrimental influence of acceleration on amplification of stochastic disturbances and suggest that metrics other than convergence rate have to be considered when evaluating performance of optimization algorithms. 
    more » « less
  3. We introduce a generic scheme for accelerating gradient-based optimization methods in the sense of Nesterov. The approach, called Catalyst, builds upon the inexact accelerated proximal point algorithm for minimizing a convex objective function, and consists of approximately solving a sequence of well-chosen auxiliary problems, leading to faster convergence. One of the keys to achieve acceleration in theory and in practice is to solve these sub-problems with appropriate accuracy by using the right stopping criterion and the right warm-start strategy. We give practical guidelines to use Catalyst and present a comprehensive analysis of its global complexity. We show that Catalyst applies to a large class of algorithms, including gradient descent, block coordinate descent, incremental algorithms such as SAG, SAGA, SDCA, SVRG, MISO/Finito, and their proximal variants. For all of these methods, we establish faster rates using the Catalyst acceleration, for strongly convex and non-strongly convex objectives. We conclude with extensive experiments showing that acceleration is useful in practice, especially for ill-conditioned problems. 
    more » « less
  4. One-particle Green’s functions obtained from the self-consistent solution of the Dyson equation can be employed in the evaluation of spectroscopic and thermodynamic properties for both molecules and solids. However, typical acceleration techniques used in the traditional quantum chemistry self-consistent algorithms cannot be easily deployed for the Green’s function methods because of a non-convex grand potential functional and a non-idempotent density matrix. Moreover, the optimization problem can become more challenging due to the inclusion of correlation effects, changing chemical potential, and fluctuations of the number of particles. In this paper, we study acceleration techniques to target the self-consistent solution of the Dyson equation directly. We use the direct inversion in the iterative subspace (DIIS), the least-squared commutator in the iterative subspace (LCIIS), and the Krylov space accelerated inexact Newton method (KAIN). We observe that the definition of the residual has a significant impact on the convergence of the iterative procedure. Based on the Dyson equation, we generalize the concept of the commutator residual used in DIIS and LCIIS and compare it with the difference residual used in DIIS and KAIN. The commutator residuals outperform the difference residuals for all considered molecular and solid systems within both GW and GF2. For a number of bond-breaking problems, we found that an easily obtained high-temperature solution with effectively suppressed correlations is a very effective starting point for reaching convergence of the problematic low-temperature solutions through a sequential reduction of temperature during calculations.

     
    more » « less
  5. This work concerns the analysis and design of distributed first-order optimization algorithms over time-varying graphs. The goal of such algorithms is to optimize a global function that is the average of local functions using only local computations and communications. Several different algorithms have been proposed that achieve linear convergence to the global optimum when the local functions are strongly convex. We provide a unified analysis that yields the worst-case linear convergence rate as a function of the condition number of the local functions, the spectral gap of the graph, and the parameters of the algorithm. The framework requires solving a small semidefinite program whose size is fixed; it does not depend on the number of local functions or the dimension of their domain. The result is a computationally efficient method for distributed algorithm analysis that enables the rapid comparison, selection, and tuning of algorithms. Finally, we propose a new algorithm, which we call SVL, that is easily implementable and achieves a faster worst-case convergence rate than all other known algorithms. 
    more » « less