skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: On the Tradeoff Between Computation and Communication Costs for Distributed Linearly Separable Computation
This paper studies the distributed linearly separable computation problem, which is a generalization of many existing distributed computing problems such as distributed gradient coding and distributed linear transform. A master asks N distributed workers to compute a linearly separable function of K datasets, which is a set of Kc linear combinations of K equal-length messages (each message is a function of one dataset). We assign some datasets to each worker in an uncoded manner, who then computes the corresponding messages and returns some function of these messages, such that from the answers of any Nr out of N workers the master can recover the task function with high probability. In the literature, the specific case where Kc = 1 or where the computation cost is minimum has been considered. In this paper, we focus on the general case (i.e., general Kc and general computation cost) and aim to find the minimum communication cost. We first propose a novel converse bound on the communication cost under the constraint of the popular cyclic assignment (widely considered in the literature), which assigns the datasets to the workers in a cyclic way. Motivated by the observation that existing strategies for distributed computing fall short of achieving the converse bound, we propose a novel distributed computing scheme for some system parameters. The proposed computing scheme is optimal for any assignment when Kc is large and is optimal under the cyclic assignment when the numbers of workers and datasets are equal or Kc is small. In addition, it is order optimal within a factor of 2 under the cyclic assignment for the remaining cases.  more » « less
Award ID(s):
1817154 2045656 2007108
PAR ID:
10297826
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE Transactions on Communications
ISSN:
0090-6778
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study the optimal design of a heterogeneous coded elastic computing (CEC) network where machines have varying relative computation speeds. CEC introduced by Yang et al. is a framework which mitigates the impact of elastic events, where machines join and leave the network. A set of data is distributed among storage constrained machines using a Maximum Distance Separable (MDS) code such that any subset of machines of a specific size can perform the desired computations. This design eliminates the need to re-distribute the data after each elastic event. In this work, we develop a process for an arbitrary heterogeneous computing network to minimize the overall computation time by defining an optimal computation load, or number of computations assigned to each machine. We then present an algorithm to define a specific computation assignment among the machines that makes use of the MDS code and meets the optimal computation load. 
    more » « less
  2. The K User Linear Computation Broadcast (LCBC) problem is comprised of d dimensional data (from Fq), that is fully available to a central server, and K users, who require various linear computations of the data, and have prior knowledge of various linear functions of the data as side-information. The optimal broadcast cost is the minimum number of q-ary symbols to be broadcast by the server per computation instance, for every user to retrieve its desired computation. The reciprocal of the optimal broadcast cost is called the capacity. The main contribution of this paper is the exact capacity characterization for the K = 3 user LCBC for all cases, i.e., for arbitrary finite fields Fq, arbitrary data dimension d, and arbitrary linear side-informations and demands at each user. A remarkable aspect of the converse (impossibility result) is that unlike the 2 user LCBC whose capacity was determined previously, the entropic formulation (where the entropies of demands and side-informations are specified, but not their functional forms) is insufficient to obtain a tight converse for the 3 user LCBC. Instead, the converse exploits functional submodularity. Notable aspects of achievability include sufficiency of vector linear coding schemes, subspace decompositions that parallel those found previously by Yao Wang in degrees of freedom (DoF) studies of wireless broadcast networks, and efficiency tradeoffs that lead to a constrained waterfilling solution. Random coding arguments are invoked to resolve compatibility issues that arise as each user has a different view of the subspace decomposition, conditioned on its own side-information. 
    more » « less
  3. Distributed learning platforms for processing large scale data-sets are becoming increasingly prevalent. In typical distributed implementations, a centralized master node breaks the data-set into smaller batches for parallel processing across distributed workers to achieve speed-up and efficiency. Several computational tasks are of sequential nature, and involve multiple passes over the data. At each iteration over the data, it is common practice to randomly re-shuffle the data at the master node, assigning different batches for each worker to process. This random re-shuffling operation comes at the cost of extra communication overhead, since at each shuffle, new data points need to be delivered to the distributed workers. In this paper, we focus on characterizing the information theoretically optimal communication overhead for the distributed data shuffling problem. We propose a novel coded data delivery scheme for the case of no excess storage, where every worker can only store the assigned data batches under processing. Our scheme exploits a new type of coding opportunity and is applicable to any arbitrary shuffle, and for any number of workers. We also present information theoretic lower bounds on the minimum communication overhead for data shuffling, and show that the proposed scheme matches this lower bound for the worst-case communication overhead. 
    more » « less
  4. We consider a scenario involving computations over a massive dataset stored distributedly across multiple workers, which is at the core of distributed learning algorithms. We propose Lagrange Coded Computing (LCC), a new framework to simultaneously provide (1) resiliency against stragglers that may prolong computations; (2) security against Byzantine (or malicious) workers that deliberately modify the computation for their benefit; and (3) (information-theoretic) privacy of the dataset amidst possible collusion of workers. LCC, which leverages the well-known Lagrange polynomial to create computation redundancy in a novel coded form across workers, can be applied to any computation scenario in which the function of interest is an arbitrary multivariate polynomial of the input dataset, hence covering many computations of interest in machine learning. LCC significantly generalizes prior works to go beyond linear computations. It also enables secure and private computing in distributed settings, improving the computation and communication efficiency of the state-of-the-art. Furthermore, we prove the optimality of LCC by showing that it achieves the optimal tradeoff between resiliency, security, and privacy, i.e., in terms of tolerating the maximum number of stragglers and adversaries, and providing data privacy against the maximum number of colluding workers. Finally, we show via experiments on Amazon EC2 that LCC speeds up the conventional uncoded implementation of distributed least-squares linear regression by up to 13.43×, and also achieves a 2.36×-12.65× speedup over the state-of-the-art straggler mitigation strategies. 
    more » « less
  5. Linear computation broadcast (LCBC) refers to a setting with d dimensional data stored at a central server, where K users, each with some prior linear side-information, wish to compute various linear combinations of the data. For each computation instance, the data is represented as a d-dimensional vector with elements in a finite field Fpn where pn is a power of a prime. The computation is to be performed many times, and the goal is to determine the minimum amount of information per computation instance that must be broadcast to satisfy all the users. The reciprocal of the optimal broadcast cost per computation instance is the capacity of LCBC. The capacity is known for up to K = 3 users. Since LCBC includes index coding as a special case, large K settings of LCBC are at least as hard as the index coding problem. As such the general LCBC problem is beyond our reach and we do not pursue it. Instead of the general setting (all cases), by focusing on the generic setting (almost all cases) this work shows that the generic capacity of the symmetric LCBC (where every user has m′ dimensions of side-information and m dimensions of demand) for large number of users (K ≥ d suffices) is Cg = 1/∆g, where ∆g = min{ max{0, d − m' }, dm/(m+m′)}, is the broadcast cost that is both achievable and unbeatable asymptotically almost surely for large n, among all LCBC instances with the given parameters p, K, d, m, m′. Relative to baseline schemes of random coding or separate transmissions, Cg shows an extremal gain by a factor of K as a function of number of users, and by a factor of ≈ d/4 as a function of data dimensions, when optimized over remaining parameters. For arbitrary number of users, the generic capacity of the symmetric LCBC is characterized within a factor of 2. 
    more » « less