Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD

Wang, Jianyu; Liang, Hao; Joshi, Gauri

doi:10.1109/ICASSP40776.2020.9053834

Citation Details

Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD

Distributed stochastic gradient descent (SGD) is essential for scaling the machine learning algorithms to a large number of computing nodes. However, the infrastructures variability such as high communication delay or random node slowdown greatly impedes the performance of distributed SGD algorithm, especially in a wireless system or sensor networks. In this paper, we propose an algorithmic approach named Overlap Local-SGD (and its momentum variant) to overlap communication and computation so as to speedup the distributed training procedure. The approach can help to mitigate the straggler effects as well. We achieve this by adding an anchor model on each node. After multiple local updates, locally trained models will be pulled back towards the synchronized anchor model rather than communicating with others. Experimental results of training a deep neural network on CIFAR-10 dataset demonstrate the effectiveness of Overlap Local-SGD. We also provide a convergence guarantee for the proposed algorithm under non-convex objective functions. more »

Award ID(s):: 1850029

PAR ID:: 10217506

Author(s) / Creator(s):: Wang, Jianyu; Liang, Hao; Joshi, Gauri

Date Published:: 2020-05-01

Journal Name:: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Page Range / eLocation ID:: 8871 to 8875

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ICASSP40776.2020.9053834

More Like this