Search for: All records

Award ID contains: 1850029

« Prev Next »

Total Resources

7

Resource Type
Conference Paper

5

Conference Proceeding

0

Dataset

0

Journal Article

2

Workshop Report

0

Availability
Full Text / Resource Available

7

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Slow and Stale Gradients Can Win the Race

https://doi.org/10.1109/JSAIT.2021.3103770

Dutta, Sanghamitra ; Wang, Jianyu ; Joshi, Gauri ( September 2021 , IEEE Journal on Selected Areas in Information Theory)

Full Text Available
Adaptive Quantization of Model Updates for Communication-Efficient Federated Learning

https://doi.org/10.1109/ICASSP39728.2021.9413697

Jhunjhunwala, Divyansh ; Gadhikar, Advait ; Joshi, Gauri ; Eldar, Yonina C. ( June 2021 , IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))

Communication of model updates between client nodes and the central aggregating server is a major bottleneck in federated learning, especially in bandwidth-limited settings and high-dimensional models. Gradient quantization is an effective way of reducing the number of bits required to communicate each model update, albeit at the cost of having a higher error floor due to the higher variance of the stochastic gradients. In this work, we propose an adaptive quantization strategy called AdaQuantFL that aims to achieve communication efficiency as well as a low error floor by changing the number of quantization levels during the course of training. Experiments on training deep neural networks show that our method can converge in much fewer communicated bits as compared to fixed quantization level setups, with little or no impact on training and test accuracy.
more » « less
Full Text Available
Cooperative SGD: A Unified Framework for the Design and Analysis of Local-Update SGD Algorithms

Wang, Jianyu ; Joshi, Gauri ( January 2021 , Journal of machine learning research)

When training machine learning models using stochastic gradient descent (SGD) with a large number of nodes or massive edge devices, the communication cost of synchronizing gradients at every iteration is a key bottleneck that limits the scalability of the system and hinders the benefit of parallel computation. Local-update SGD algorithms, where worker nodes perform local iterations of SGD and periodically synchronize their local models, can effectively reduce the communication frequency and save the communication delay. In this paper, we propose a powerful framework, named Cooperative SGD, that subsumes a variety of local-update SGD algorithms (such as local SGD, elastic averaging SGD, and decentralized parallel SGD) and provides a unified convergence analysis. Notably, special cases of the unified convergence analysis provided by the cooperative SGD framework yield 1) the first convergence analysis of elastic averaging SGD for general non-convex objectives, and 2) improvements upon previous analyses of local SGD and decentralized parallel SGD. Moreover, we design new algorithms such as elastic averaging SGD with overlapped computation and communication, and decentralized periodic averaging which are shown to be 4x or more faster than the baseline in reaching the same training loss.
more » « less
Full Text Available
Tackling the Objective Inconsistency Problem in Heterogeneous Federated Optimization

Wang, Jianyu ; Liu, Qinghua ; Liang, Hao ; Joshi, Gauri ; Poor, H. Vincent ( December 2020 , Advances in neural information processing systems)
null (Ed.)
In federated learning, heterogeneity in the clients' local datasets and computation speeds results in large variations in the number of local updates performed by each client in each communication round. Naive weighted aggregation of such models causes objective inconsistency, that is, the global model converges to a stationary point of a mismatched objective function which can be arbitrarily different from the true objective. This paper provides a general framework to analyze the convergence of federated heterogeneous optimization algorithms. It subsumes previously proposed methods such as FedAvg and FedProx and provides the first principled understanding of the solution bias and the convergence slowdown due to objective inconsistency. Using insights from this analysis, we propose FedNova, a normalized averaging method that eliminates objective inconsistency while preserving fast error convergence.
more » « less
Full Text Available
Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD

https://doi.org/10.1109/ICASSP40776.2020.9053834

Wang, Jianyu ; Liang, Hao ; Joshi, Gauri ( May 2020 , IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP))
null (Ed.)
Distributed stochastic gradient descent (SGD) is essential for scaling the machine learning algorithms to a large number of computing nodes. However, the infrastructures variability such as high communication delay or random node slowdown greatly impedes the performance of distributed SGD algorithm, especially in a wireless system or sensor networks. In this paper, we propose an algorithmic approach named Overlap Local-SGD (and its momentum variant) to overlap communication and computation so as to speedup the distributed training procedure. The approach can help to mitigate the straggler effects as well. We achieve this by adding an anchor model on each node. After multiple local updates, locally trained models will be pulled back towards the synchronized anchor model rather than communicating with others. Experimental results of training a deep neural network on CIFAR-10 dataset demonstrate the effectiveness of Overlap Local-SGD. We also provide a convergence guarantee for the proposed algorithm under non-convex objective functions.
more » « less
Full Text Available
Cooperative SGD: A Unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms

Wang, Jianyu ; Joshi, Gauri ( June 2019 , ICML Workshop on Coding Theory for Machine Learning)

Communication-efficient SGD algorithms, which allow nodes to perform local updates and periodically synchronize local models, are highly effective in improving the speed and scalability of distributed SGD. However, a rigorous convergence analysis and comparative study of different communication-reduction strategies remains a largely open problem. This paper presents a unified framework called Cooperative SGD that subsumes existing communication-efficient SGD algorithms such as periodic-averaging, elastic-averaging, and decentralized SGD. By analyzing Cooperative SGD, we provide novel convergence guarantees for existing algorithms. Moreover, this framework enables us to design new communication-efficient SGD algorithms that strike the best balance between reducing communication overhead and achieving fast error convergence with a low error floor.
more » « less
Full Text Available
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-update SGD

Wang, Jianyu ; Joshi, Gauri ( April 2019 , Systems and Machine Learning (SysML) Conference)

Large-scale machine learning training, in particular, distributed stochastic gradient descent, needs to be robust to inherent system variability such as node straggling and random communication delays. This work considers a distributed training framework where each worker node is allowed to perform local model updates and the resulting models are averaged periodically. We analyze the true speed of error convergence with respect to wall-clock time (instead of the number of iterations) and analyze how it is affected by the frequency of averaging. The main contribution is the design of ADACOMM, an adaptive communication strategy that starts with infrequent averaging to save communication delay and improve convergence speed, and then increases the communication frequency in order to achieve a low error floor. Rigorous experiments on training deep neural networks show that ADACOMM can take 3x less time than fully synchronous SGD and still reach the same final training loss.
more » « less
Full Text Available