skip to main content


Title: On the Tradeoff Between Computation and Communication Costs for Distributed Linearly Separable Computation
This paper studies the distributed linearly separable computation problem, which is a generalization of many existing distributed computing problems such as distributed gradient coding and distributed linear transform. A master asks N distributed workers to compute a linearly separable function of K datasets, which is a set of Kc linear combinations of K equal-length messages (each message is a function of one dataset). We assign some datasets to each worker in an uncoded manner, who then computes the corresponding messages and returns some function of these messages, such that from the answers of any Nr out of N workers the master can recover the task function with high probability. In the literature, the specific case where Kc = 1 or where the computation cost is minimum has been considered. In this paper, we focus on the general case (i.e., general Kc and general computation cost) and aim to find the minimum communication cost. We first propose a novel converse bound on the communication cost under the constraint of the popular cyclic assignment (widely considered in the literature), which assigns the datasets to the workers in a cyclic way. Motivated by the observation that existing strategies for distributed computing fall short of achieving the converse bound, we propose a novel distributed computing scheme for some system parameters. The proposed computing scheme is optimal for any assignment when Kc is large and is optimal under the cyclic assignment when the numbers of workers and datasets are equal or Kc is small. In addition, it is order optimal within a factor of 2 under the cyclic assignment for the remaining cases.  more » « less
Award ID(s):
1817154 2045656 2007108
NSF-PAR ID:
10297826
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE Transactions on Communications
ISSN:
0090-6778
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Distributed learning platforms for processing large scale data-sets are becoming increasingly prevalent. In typical distributed implementations, a centralized master node breaks the data-set into smaller batches for parallel processing across distributed workers to achieve speed-up and efficiency. Several computational tasks are of sequential nature, and involve multiple passes over the data. At each iteration over the data, it is common practice to randomly re-shuffle the data at the master node, assigning different batches for each worker to process. This random re-shuffling operation comes at the cost of extra communication overhead, since at each shuffle, new data points need to be delivered to the distributed workers. In this paper, we focus on characterizing the information theoretically optimal communication overhead for the distributed data shuffling problem. We propose a novel coded data delivery scheme for the case of no excess storage, where every worker can only store the assigned data batches under processing. Our scheme exploits a new type of coding opportunity and is applicable to any arbitrary shuffle, and for any number of workers. We also present information theoretic lower bounds on the minimum communication overhead for data shuffling, and show that the proposed scheme matches this lower bound for the worst-case communication overhead. 
    more » « less
  2. null (Ed.)
    Maximal Independent Set (MIS) is one of the fundamental problems in distributed computing. The round (time) complexity of distributed MIS has traditionally focused on the worst-case time for all nodes to finish. The best-known (randomized) MIS algorithms take O(log n) worst-case rounds on general graphs (where n is the number of nodes). Breaking the O(log n) worst-case bound has been a longstanding open problem, while currently the best-known lower bound is [EQUATION] rounds. Motivated by the goal to reduce total energy consumption in energy-constrained networks such as sensor and ad hoc wireless networks, we take an alternative approach to measuring performance. We focus on minimizing the total (or equivalently, the average) time for all nodes to finish. It is not clear whether the currently best-known algorithms yield constant-round (or even o(log n)) node-averaged round complexity for MIS in general graphs. We posit the sleeping model, a generalization of the traditional model, that allows nodes to enter either "sleep" or "waking" states at any round. While waking state corresponds to the default state in the traditional model, in sleeping state a node is "offline", i.e., it does not send or receive messages (and messages sent to it are dropped as well) and does not incur any time, communication, or local computation cost. Hence, in this model, only rounds in which a node is awake are counted and we are interested in minimizing the average as well as the worst-case number of rounds a node spends in the awake state, besides the traditional worst-case round complexity (i.e., the rounds for all nodes to finish including both the awake and sleeping rounds). Our main result is that we show that MIS can be solved in (expected) O(1) rounds under node-averaged awake complexity measure in the sleeping model. In particular, we present a randomized distributed algorithm for MIS that has expected O(1)-rounds node-averaged awake complexity and, with high probability1 has O(log n)-rounds worst-case awake complexity and O(log3.41 n)-rounds worst-case complexity. Our work is a step towards understanding the node-averaged complexity of MIS both in the traditional and sleeping models, as well as designing energy-efficient distributed algorithms for energy-constrained networks. 
    more » « less
  3. We study the optimal design of a heterogeneous coded elastic computing (CEC) network where machines have varying relative computation speeds. CEC introduced by Yang et al. is a framework which mitigates the impact of elastic events, where machines join and leave the network. A set of data is distributed among storage constrained machines using a Maximum Distance Separable (MDS) code such that any subset of machines of a specific size can perform the desired computations. This design eliminates the need to re-distribute the data after each elastic event. In this work, we develop a process for an arbitrary heterogeneous computing network to minimize the overall computation time by defining an optimal computation load, or number of computations assigned to each machine. We then present an algorithm to define a specific computation assignment among the machines that makes use of the MDS code and meets the optimal computation load. 
    more » « less
  4. We study the problem of distributed task allocation by workers in an ant colony in a setting of limited capabilities and noisy environment feedback. We assume that each task has a demand that should be satisfied but not exceeded, i.e., there is an optimal number of ants that should be working on this task at a given time. The goal is to assign a near-optimal number of workers to each task in a distributed manner without explicit access to the value of the demand nor to the number of ants working on the task. We seek to answer the question of how the quality of task allocation depends on the accuracy of assessing by the ants whether too many (overload) or not enough (lack) ants are currently working on a given task. In our model, each ant receives a binary feedback that depends on the deficit, defined as the difference between the demand and the current number of workers in the task. The feedback is modeled as a random variable that takes values lack or overload with probability given by a sigmoid function of the deficit. The higher the overload or lack of workers for a task, the more likely it is that an ant receives the correct feedback from this task; the closer the deficit is to zero, the less reliable the feedback becomes. Each ant receives the feedback independently about one chosen task. We measure the performance of task allocation algorithms using the notion of inaccuracy, defined as the number of steps in which the deficit of some task is beyond certain threshold. We propose a simple, constant-memory, self-stabilizing, distributed algorithm that converges from any initial assignment to a near-optimal assignment under noisy feedback and keeps the deficit small for all tasks in almost every step. We also prove a lower bound for any constant-memory algorithm, which matches, up to a constant factor, the accuracy achieved by our algorithm. 
    more » « less
  5. null (Ed.)
    Motivated by the increasing need to understand the distributed algorithmic foundations of large-scale graph computations, we study some fundamental graph problems in a message-passing model for distributed computing where k ≥ 2 machines jointly perform computations on graphs with n nodes (typically, n >> k). The input graph is assumed to be initially randomly partitioned among the k machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication rounds of the computation. Our main contribution is the General Lower Bound Theorem , a theorem that can be used to show non-trivial lower bounds on the round complexity of distributed large-scale data computations. This result is established via an information-theoretic approach that relates the round complexity to the minimal amount of information required by machines to solve the problem. Our approach is generic, and this theorem can be used in a “cookbook” fashion to show distributed lower bounds for several problems, including non-graph problems. We present two applications by showing (almost) tight lower bounds on the round complexity of two fundamental graph problems, namely, PageRank computation and triangle enumeration . These applications show that our approach can yield lower bounds for problems where the application of communication complexity techniques seems not obvious or gives weak bounds, including and especially under a stochastic partition of the input. We then present distributed algorithms for PageRank and triangle enumeration with a round complexity that (almost) matches the respective lower bounds; these algorithms exhibit a round complexity that scales superlinearly in k , improving significantly over previous results [Klauck et al., SODA 2015]. Specifically, we show the following results: PageRank: We show a lower bound of Ὼ(n/k 2 ) rounds and present a distributed algorithm that computes an approximation of the PageRank of all the nodes of a graph in Õ(n/k 2 ) rounds. Triangle enumeration: We show that there exist graphs with m edges where any distributed algorithm requires Ὼ(m/k 5/3 ) rounds. This result also implies the first non-trivial lower bound of Ὼ(n 1/3 ) rounds for the congested clique model, which is tight up to logarithmic factors. We then present a distributed algorithm that enumerates all the triangles of a graph in Õ(m/k 5/3 + n/k 4/3 ) rounds. 
    more » « less