A Proactive Data-Parallel Framework for Machine Learning

Zhao, Guoyi; Zhou, Tian; Gao, Lixin

doi:10.1145/3492324.3494167

Citation Details

A Proactive Data-Parallel Framework for Machine Learning

Data parallel frameworks become essential for training machine learning models. The classic Bulk Synchronous Parallel (BSP) model updates the model parameters through pre-defined synchronization barriers. However, when a worker computes significantly slower than other workers, waiting for the slow worker will lead to excessive waste of computing resources. In this paper, we propose a novel proactive data-parallel (PDP) framework. PDP enables the parameter server to initiate the update of the model parameter. That is, we can perform the update at any time without pre-defined update points. PDP not only initiates the update but also determines when to update. The global decision on the frequency of updates will accelerate the training. We further propose asynchronous PDP to reduce the idle time caused by synchronizing parameter updates. We theoretically prove the convergence property of asynchronous PDP. We implement a distributed PDP framework and evaluate PDP with several popular machine learning algorithms including Multilayer Perceptron, Convolutional Neural Network, K-means, and Gaussian Mixture Model. Our evaluation shows that PDP can achieve up to 20X speedup over the BSP model and scale to large clusters. more »

Award ID(s):: 1908536

PAR ID:: 10356563

Author(s) / Creator(s):: Zhao, Guoyi; Zhou, Tian; Gao, Lixin

Date Published:: 2021-12-06

Journal Name:: IEEE/ACM 8th International Conference on Big Data Computing, Applications and Technologies (BDCAT)

Page Range / eLocation ID:: 69 to 79

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3492324.3494167

More Like this