Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
We present a convergence rate analysis for biased stochastic gradient descent (SGD), where individual gradient updates are corrupted by computation errors. We develop stochastic quadratic constraints to formulate a small linear matrix inequality (LMI) whose feasible points lead to convergence bounds of biased SGD. Based on this LMI condition, we develop a sequential minimization approach to analyze the intricate trade-offs that couple stepsize selection, convergence rate, optimization accuracy, and robustness to gradient inaccuracy. We also provide feasible points for this LMI and obtain theoretical formulas that quantify the convergence properties of biased SGD under various assumptions on the loss functions.more » « less
-
We consider a linear time-invariant system in discrete time where the state and input signals satisfy a set of integral quadratic constraints (IQCs). Analogous to the autonomous linear systems case, we define a new notion of spectral radius that exactly characterizes stability of this system. In particular, (i) when the spectral radius is less than one, we show that the system is asymptotically stable for all trajectories that satisfy the IQCs, and (ii) when the spectral radius is equal to one, we construct an unstable trajectory that satisfies the IQCs. Furthermore, we connect our new definition of the spectral radius to the existing literature on IQCs.more » « less
-
We consider the distributed optimization problem in which a network of agents aims to minimize the average of local functions. To solve this problem, several algorithms have recently been proposed where agents perform various combinations of communication with neighbors, local gradient computations, and updates to local state variables. In this paper, we present a canonical form that characterizes any first-order distributed algorithm that can be implemented using a single round of communication and gradient computation per iteration, and where each agent stores up to two state variables. The canonical form features a minimal set of parameters that are both unique and expressive enough to capture any distributed algorithm in this class. The generic nature of our canonical form enables the systematic analysis and design of distributed optimization algorithms.more » « less
-
We present a novel way of generating Lyapunov functions for proving linear convergence rates of first-order optimization methods. Our approach provably obtains the fastest linear convergence rate that can be verified by a quadratic Lyapunov function (with given states), and only relies on solving a small-sized semidefinite program. Our approach combines the advantages of performance estimation problems (PEP, due to Drori & Teboulle (2014)) and integral quadratic constraints (IQC, due to Lessard et al. (2016)), and relies on convex interpolation (due to Taylor et al. (2017c;b)).more » « less
-
Techniques for reducing the variance of gradient estimates used in stochastic programming algorithms for convex finite-sum problems have received a great deal of attention in recent years. By leveraging dissipativity theory from control, we provide a new perspective on two important variance-reduction algorithms: SVRG and its direct accelerated variant Katyusha. Our perspective provides a physically intuitive understanding of the behavior of SVRG-like methods via a principle of energy conservation. The tools discussed here allow us to automate the convergence analysis of SVRG-like methods by capturing their essential properties in small semidefinite programs amenable to standard analysis and computational techniques. Our approach recovers existing convergence results for SVRG and Katyusha and generalizes the theory to alternative parameter choices. We also discuss how our approach complements the linear coupling technique. Our combination of perspectives leads to a better understanding of accelerated variance-reduced stochastic methods for finite-sum problems.more » « less
-
This work proposes an accelerated first-order algorithm we call the Robust Momentum Method for optimizing smooth strongly convex functions. The algorithm has a single scalar parameter that can be tuned to trade off robustness to gradient noise versus worst-case convergence rate. At one extreme, the algorithm is faster than Nesterov's Fast Gradient Method by a constant factor but more fragile to noise. At the other extreme, the algorithm reduces to the Gradient Method and is very robust to noise. The algorithm design technique is inspired by methods from classical control theory and the resulting algorithm has a simple analytical form. Algorithm performance is verified on a series of numerical simulations in both noise-free and relative gradient noise cases.more » « less
-
We present a unified framework for analyzing the convergence of distributed optimization algorithms by formulating a semidefinite program (SDP) which can be efficiently solved to bound the linear rate of convergence. Two different SDP formulations are considered. First, we formulate an SDP that depends explicitly on the gossip matrix of the network graph. This result provides bounds that depend explicitly on the graph topology, but the SDP dimension scales with the size of the graph. Second, we formulate an SDP that depends implicitly on the gossip matrix via its spectral gap. This result provides coarser bounds, but yields a small SDP that is independent of graph size. Our approach improves upon existing bounds for the algorithms we analyzed, and numerical simulations reveal that our bounds are likely tight. The efficient and automated nature of our analysis makes it a powerful tool for algorithm selection and tuning, and for the discovery of new algorithms as well.more » « less
-
In this paper, we adapt the control theoretic concept of dissipativity theory to provide a natural understanding of Nesterov’s accelerated method. Our theory ties rigorous convergence rate analysis to the physically intuitive notion of energy dissipation. Moreover, dissipativity allows one to efficiently construct Lyapunov functions (either numerically or analytically) by solving a small semidefinite program. Using novel supply rate functions, we show how to recover known rate bounds for Nesterov’s method and we generalize the approach to certify both linear and sublinear rates in a variety of settings. Finally, we link the continuous-time version of dissipativity to recent works on algorithm analysis that use discretizations of ordinary differential equations.more » « less
An official website of the United States government

Full Text Available