Approximately Optimal Distributed Data Shuffling

Attia, Mohamed Adel; Tandon, Ravi

doi:10.1109/ISIT.2018.8437325

Citation Details

Approximately Optimal Distributed Data Shuffling

Data shuffling between distributed workers is one of the critical steps in implementing large-scale learning algorithms. The focus of this work is to understand the fundamental trade-off between the amount of storage and the communication overhead for distributed data shuffling. We first present an information theoretic formulation for the data shuffling problem, accounting for the underlying problem parameters (i.e., number of workers, K, number of data points, N, and the available storage, S per node). Then, we derive an information theoretic lower bound on the communication overhead for data shuffling as a function of these parameters. Next, we present a novel coded communication scheme and show that the resulting communication overhead of the proposed scheme is within a multiplicative factor of at most 2 from the lower bound. Furthermore, we introduce an improved aligned coded shuffling scheme, which achieves the optimal storage vs communication trade-off for K <; 5, and further reduces the maximum multiplicative gap down to 7/6, for K ≥ 5. more »

Award ID(s):: 1651492

PAR ID:: 10084305

Author(s) / Creator(s):: Attia, Mohamed Adel; Tandon, Ravi

Date Published:: 2018-06-01

Journal Name:: 2018 IEEE International Symposium on Information Theory (ISIT)

Page Range / eLocation ID:: 721 to 725

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ISIT.2018.8437325

More Like this