ZenoPS: A Distributed Learning System Integrating Communication Efficiency and Security

Xie, Cong; Koyejo, Oluwasanmi; Gupta, Indranil

doi:10.3390/a15070233

Citation Details

ZenoPS: A Distributed Learning System Integrating Communication Efficiency and Security

Distributed machine learning is primarily motivated by the promise of increased computation power for accelerating training and mitigating privacy concerns. Unlike machine learning on a single device, distributed machine learning requires collaboration and communication among the devices. This creates several new challenges: (1) the heavy communication overhead can be a bottleneck that slows down the training, and (2) the unreliable communication and weaker control over the remote entities make the distributed system vulnerable to systematic failures and malicious attacks. This paper presents a variant of stochastic gradient descent (SGD) with improved communication efficiency and security in distributed environments. Our contributions include (1) a new technique called error reset to adapt both infrequent synchronization and message compression for communication reduction in both synchronous and asynchronous training, (2) new score-based approaches for validating the updates, and (3) integration with both error reset and score-based validation. The proposed system provides communication reduction, both synchronous and asynchronous training, Byzantine tolerance, and local privacy preservation. We evaluate our techniques both theoretically and empirically. more »

Award ID(s):: 1909577 2046795 1934986

PAR ID:: 10387023

Author(s) / Creator(s):: Xie, Cong; Koyejo, Oluwasanmi; Gupta, Indranil

Date Published:: 2022-07-01

Journal Name:: Algorithms

Volume:: 15

Issue:: 7

ISSN:: 1999-4893

Page Range / eLocation ID:: 233

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.3390/a15070233

More Like this