skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Distributed Optimization over Time-Varying Networks: Imperfect Information with Feedback is as Good as Perfect Information
The convergence of an error-feedback algorithm is studied for decentralized stochastic gradient descent (DSGD) algorithm with compressed information sharing over time-varying graphs. It is shown that for both strongly-convex and convex cost functions, despite of imperfect information sharing, the convergence rates match those with perfect information sharing. To do so, we show that for strongly-convex loss functions, with a proper choice of a step-size, the state of each node converges to the global optimizer at the rate of O(T^{−1}). Similarly, for general convex cost functions, with a proper choice of step-size, we show that the value of loss function at a temporal average of each node’s estimates converges to the optimal value at the rate of O(T^{−1/2+ϵ }) for any ϵ > 0.  more » « less
Award ID(s):
1749981
PAR ID:
10427582
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2022 American Control Conference (ACC)
Page Range / eLocation ID:
2791 to 2796
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study strongly convex distributed optimization problems where a set of agents are interested in solving a separable optimization problem collaboratively. In this article, we propose and study a two-time-scale decentralized gradient descent algorithm for a broad class of lossy sharing of information over time-varying graphs. One time-scale fades out the (lossy) incoming information from neighboring agents, and one time-scale regulates the local loss functions' gradients. We show that assuming a proper choice of step-size sequences, certain connectivity conditions, and bounded gradients along the trajectory of the dynamics, the agents' estimates converge to the optimal solution with the rate of O(T^{−1/2}) . We also provide novel tools to study distributed optimization with diminishing averaging weights over time-varying graphs. 
    more » « less
  2. This papers studies multi-agent (convex and nonconvex) optimization over static digraphs. We propose a general distributed asynchronous algorithmic framework whereby i) agents can update their local variables as well as communicate with their neighbors at any time, without any form of coordination; and ii) they can perform their local computations using (possibly) delayed, out-of-sync information from their neighbors. Delays need not be known to the agents or obey any specific profile, and can also be time-varying (but bounded). The algorithm builds on a tracking mechanism that is robust against asynchrony (in the above sense), whose goal is to estimate locally the sum of agents’ gradients. When applied to strongly convex functions, we prove that it converges at an R-linear (geometric) rate as long as the step-size is sufficiently small. A sublinear convergence rate is proved, when nonconvex problems and/or diminishing, uncoordinated step-sizes are employed. To the best of our knowledge, this is the first distributed algorithm with provable geometric convergence rate in such a general asynchonous setting. 
    more » « less
  3. null (Ed.)
    In this paper, we study communication-efficient decentralized training of large-scale machine learning models over a network. We propose and analyze SQuARM-SGD, a decentralized training algorithm, employing momentum and compressed communication between nodes regulated by a locally computable triggering rule. In SQuARM-SGD, each node performs a fixed number of local SGD (stochastic gradient descent) steps using Nesterov's momentum and then sends sparisified and quantized updates to its neighbors only when there is a significant change in its model parameters since the last time communication occurred. We provide convergence guarantees of our algorithm for strongly-convex and non-convex smooth objectives. We believe that ours is the first theoretical analysis for compressed decentralized SGD with momentum updates. We show that SQuARM-SGD converges at rate O(1/nT) for strongly-convex objectives, while for non-convex objectives it converges at rate O(1/√nT), thus matching the convergence rate of \emphvanilla distributed SGD in both these settings. We corroborate our theoretical understanding with experiments and compare the performance of our algorithm with the state-of-the-art, showing that without sacrificing much on the accuracy, SQuARM-SGD converges at a similar rate while saving significantly in total communicated bits. 
    more » « less
  4. Stochastic second-order methods are known to achieve fast local convergence in strongly convex optimization by relying on noisy Hessian estimates to precondition the gradient. Yet, most of these methods achieve superlinear convergence only when the stochastic Hessian noise diminishes, requiring an increase in the per-iteration cost as time progresses. Recent work in \cite{na2022hessian} addressed this issue via a Hessian averaging scheme that achieves a superlinear convergence rate without increasing the per-iteration cost. However, the considered method exhibits a slow global convergence rate, requiring up to ~O(κ^2) iterations to reach the superlinear rate of ~O((1/t)^{t/2}), where κ is the problem's condition number. In this paper, we propose a novel stochastic Newton proximal extragradient method that significantly improves these bounds, achieving a faster global linear rate and reaching the same fast superlinear rate in ~O(κ) iterations. We achieve this by developing a novel extension of the Hybrid Proximal Extragradient (HPE) framework, which simultaneously achieves fast global and local convergence rates for strongly convex functions with access to a noisy Hessian oracle. 
    more » « less
  5. Stochastic second-order methods accelerate local convergence in strongly convex optimization by using noisy Hessian estimates to precondition gradients. However, they typically achieve superlinear convergence only when Hessian noise diminishes, which increases per-iteration costs. Prior work [arXiv:2204.09266] introduced a Hessian averaging scheme that maintains low per-iteration cost while achieving superlinear convergence, but with slow global convergence, requiring 𝑂 ~ ( 𝜅 2 ) O ~ (κ 2 ) iterations to reach the superlinear rate of 𝑂 ~ ( ( 1 / 𝑡 ) 𝑡 / 2 ) O ~ ((1/t) t/2 ), where 𝜅 κ is the condition number. This paper proposes a stochastic Newton proximal extragradient method that improves these bounds, delivering faster global linear convergence and achieving the same fast superlinear rate in only 𝑂 ~ ( 𝜅 ) O ~ (κ) iterations. The method extends the Hybrid Proximal Extragradient (HPE) framework, yielding improved global and local convergence guarantees for strongly convex functions with access to a noisy Hessian oracle. 
    more » « less