skip to main content


Search for: All records

Award ID contains: 1750162

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. We consider the distributed optimization problem, where a group of agents work together to optimize a common objective by communicating with neighboring agents and performing local computations. For a given algorithm, we use tools from robust control to systematically analyze the performance in the case where the communication network is time-varying. In particular, we assume only that the network is jointly connected over a finite time horizon (commonly referred to as B-connectivity), which does not require connectivity at each time instant. When applied to the distributed algorithm DIGing, our bounds are orders of magnitude tighter than those available in the literature. 
    more » « less
  2. Iterative first-order methods such as gradient descent and its variants are widely used for solving optimization and machine learning problems. There has been recent interest in analytic or numerically efficient methods for computing worst-case performance bounds for such algorithms, for example over the class of strongly convex loss functions. A popular approach is to assume the algorithm has a fixed size (fixed dimension, or memory) and that its structure is parameterized by one or two hyperparameters, for example a learning rate and a momentum parameter. Then, a Lyapunov function is sought to certify robust stability and subsequent optimization can be performed to find optimal hyperparameter tunings. In the present work, we instead fix the constraints that characterize the loss function and apply techniques from robust control synthesis to directly search over algorithms. This approach yields stronger results than those previously available, since the bounds produced hold over algorithms with an arbitrary, but finite, amount of memory rather than just holding for algorithms with a prescribed structure. 
    more » « less
  3. This work concerns the analysis and design of distributed first-order optimization algorithms over time-varying graphs. The goal of such algorithms is to optimize a global function that is the average of local functions using only local computations and communications. Several different algorithms have been proposed that achieve linear convergence to the global optimum when the local functions are strongly convex. We provide a unified analysis that yields the worst-case linear convergence rate as a function of the condition number of the local functions, the spectral gap of the graph, and the parameters of the algorithm. The framework requires solving a small semidefinite program whose size is fixed; it does not depend on the number of local functions or the dimension of their domain. The result is a computationally efficient method for distributed algorithm analysis that enables the rapid comparison, selection, and tuning of algorithms. Finally, we propose a new algorithm, which we call SVL, that is easily implementable and achieves a faster worst-case convergence rate than all other known algorithms. 
    more » « less
  4. We present a convergence rate analysis for biased stochastic gradient descent (SGD), where individual gradient updates are corrupted by computation errors. We develop stochastic quadratic constraints to formulate a small linear matrix inequality (LMI) whose feasible points lead to convergence bounds of biased SGD. Based on this LMI condition, we develop a sequential minimization approach to analyze the intricate trade-offs that couple stepsize selection, convergence rate, optimization accuracy, and robustness to gradient inaccuracy. We also provide feasible points for this LMI and obtain theoretical formulas that quantify the convergence properties of biased SGD under various assumptions on the loss functions. 
    more » « less
  5. We consider a linear time-invariant system in discrete time where the state and input signals satisfy a set of integral quadratic constraints (IQCs). Analogous to the autonomous linear systems case, we define a new notion of spectral radius that exactly characterizes stability of this system. In particular, (i) when the spectral radius is less than one, we show that the system is asymptotically stable for all trajectories that satisfy the IQCs, and (ii) when the spectral radius is equal to one, we construct an unstable trajectory that satisfies the IQCs. Furthermore, we connect our new definition of the spectral radius to the existing literature on IQCs. 
    more » « less
  6. Techniques for reducing the variance of gradient estimates used in stochastic programming algorithms for convex finite-sum problems have received a great deal of attention in recent years. By leveraging dissipativity theory from control, we provide a new perspective on two important variance-reduction algorithms: SVRG and its direct accelerated variant Katyusha. Our perspective provides a physically intuitive understanding of the behavior of SVRG-like methods via a principle of energy conservation. The tools discussed here allow us to automate the convergence analysis of SVRG-like methods by capturing their essential properties in small semidefinite programs amenable to standard analysis and computational techniques. Our approach recovers existing convergence results for SVRG and Katyusha and generalizes the theory to alternative parameter choices. We also discuss how our approach complements the linear coupling technique. Our combination of perspectives leads to a better understanding of accelerated variance-reduced stochastic methods for finite-sum problems. 
    more » « less
  7. We present a novel way of generating Lyapunov functions for proving linear convergence rates of first-order optimization methods. Our approach provably obtains the fastest linear convergence rate that can be verified by a quadratic Lyapunov function (with given states), and only relies on solving a small-sized semidefinite program. Our approach combines the advantages of performance estimation problems (PEP, due to Drori & Teboulle (2014)) and integral quadratic constraints (IQC, due to Lessard et al. (2016)), and relies on convex interpolation (due to Taylor et al. (2017c;b)). 
    more » « less
  8. This work proposes an accelerated first-order algorithm we call the Robust Momentum Method for optimizing smooth strongly convex functions. The algorithm has a single scalar parameter that can be tuned to trade off robustness to gradient noise versus worst-case convergence rate. At one extreme, the algorithm is faster than Nesterov's Fast Gradient Method by a constant factor but more fragile to noise. At the other extreme, the algorithm reduces to the Gradient Method and is very robust to noise. The algorithm design technique is inspired by methods from classical control theory and the resulting algorithm has a simple analytical form. Algorithm performance is verified on a series of numerical simulations in both noise-free and relative gradient noise cases. 
    more » « less