skip to main content


Title: Distributed Optimization over Time-Varying Networks: Imperfect Information with Feedback is as Good as Perfect Information
The convergence of an error-feedback algorithm is studied for decentralized stochastic gradient descent (DSGD) algorithm with compressed information sharing over time-varying graphs. It is shown that for both strongly-convex and convex cost functions, despite of imperfect information sharing, the convergence rates match those with perfect information sharing. To do so, we show that for strongly-convex loss functions, with a proper choice of a step-size, the state of each node converges to the global optimizer at the rate of O(T^{−1}). Similarly, for general convex cost functions, with a proper choice of step-size, we show that the value of loss function at a temporal average of each node’s estimates converges to the optimal value at the rate of O(T^{−1/2+ϵ }) for any ϵ > 0.  more » « less
Award ID(s):
1749981
NSF-PAR ID:
10427582
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2022 American Control Conference (ACC)
Page Range / eLocation ID:
2791 to 2796
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study strongly convex distributed optimization problems where a set of agents are interested in solving a separable optimization problem collaboratively. In this article, we propose and study a two-time-scale decentralized gradient descent algorithm for a broad class of lossy sharing of information over time-varying graphs. One time-scale fades out the (lossy) incoming information from neighboring agents, and one time-scale regulates the local loss functions' gradients. We show that assuming a proper choice of step-size sequences, certain connectivity conditions, and bounded gradients along the trajectory of the dynamics, the agents' estimates converge to the optimal solution with the rate of O(T^{−1/2}) . We also provide novel tools to study distributed optimization with diminishing averaging weights over time-varying graphs. 
    more » « less
  2. This papers studies multi-agent (convex and nonconvex) optimization over static digraphs. We propose a general distributed asynchronous algorithmic framework whereby i) agents can update their local variables as well as communicate with their neighbors at any time, without any form of coordination; and ii) they can perform their local computations using (possibly) delayed, out-of-sync information from their neighbors. Delays need not be known to the agents or obey any specific profile, and can also be time-varying (but bounded). The algorithm builds on a tracking mechanism that is robust against asynchrony (in the above sense), whose goal is to estimate locally the sum of agents’ gradients. When applied to strongly convex functions, we prove that it converges at an R-linear (geometric) rate as long as the step-size is sufficiently small. A sublinear convergence rate is proved, when nonconvex problems and/or diminishing, uncoordinated step-sizes are employed. To the best of our knowledge, this is the first distributed algorithm with provable geometric convergence rate in such a general asynchonous setting. 
    more » « less
  3. null (Ed.)
    In this paper, we study communication-efficient decentralized training of large-scale machine learning models over a network. We propose and analyze SQuARM-SGD, a decentralized training algorithm, employing momentum and compressed communication between nodes regulated by a locally computable triggering rule. In SQuARM-SGD, each node performs a fixed number of local SGD (stochastic gradient descent) steps using Nesterov's momentum and then sends sparisified and quantized updates to its neighbors only when there is a significant change in its model parameters since the last time communication occurred. We provide convergence guarantees of our algorithm for strongly-convex and non-convex smooth objectives. We believe that ours is the first theoretical analysis for compressed decentralized SGD with momentum updates. We show that SQuARM-SGD converges at rate O(1/nT) for strongly-convex objectives, while for non-convex objectives it converges at rate O(1/√nT), thus matching the convergence rate of \emphvanilla distributed SGD in both these settings. We corroborate our theoretical understanding with experiments and compare the performance of our algorithm with the state-of-the-art, showing that without sacrificing much on the accuracy, SQuARM-SGD converges at a similar rate while saving significantly in total communicated bits. 
    more » « less
  4. The theory of integral quadratic constraints (IQCs) allows the certification of exponential convergence of interconnected systems containing nonlinear or uncertain elements. In this work, we adapt the IQC theory to study first-order methods for smooth and strongly-monotone games and show how to design tailored quadratic constraints to get tight upper bounds of convergence rates. Using this framework, we recover the existing bound for the gradient method~(GD), derive sharper bounds for the proximal point method~(PPM) and optimistic gradient method~(OG), and provide for the first time a global convergence rate for the negative momentum method~(NM) with an iteration complexity O(κ1.5), which matches its known lower bound. In addition, for time-varying systems, we prove that the gradient method with optimal step size achieves the fastest provable worst-case convergence rate with quadratic Lyapunov functions. Finally, we further extend our analysis to stochastic games and study the impact of multiplicative noise on different algorithms. We show that it is impossible for an algorithm with one step of memory to achieve acceleration if it only queries the gradient once per batch (in contrast with the stochastic strongly-convex optimization setting, where such acceleration has been demonstrated). However, we exhibit an algorithm which achieves acceleration with two gradient queries per batch. 
    more » « less
  5. null (Ed.)
    https://arxiv.org/abs/2010.13724 We study the question of obtaining last-iterate convergence rates for no-regret learning algorithms in multi-player games. We show that the optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of O(1/T‾‾√) with respect to the gap function in smooth monotone games. This result addresses a question of Mertikopoulos & Zhou (2018), who asked whether extra-gradient approaches (such as OG) can be applied to achieve improved guarantees in the multi-agent learning setting. The proof of our upper bound uses a new technique centered around an adaptive choice of potential function at each iteration. We also show that the O(1/T‾‾√) rate is tight for all p-SCLI algorithms, which includes OG as a special case. As a byproduct of our lower bound analysis we additionally present a proof of a conjecture of Arjevani et al. (2015) which is more direct than previous approaches. 
    more » « less