skip to main content


Title: Byzantine-Tolerant Distributed Coordinate Descent
We study distributed coordinate descent (CD) in the master-worker architecture under adversarial attacks, where the data is partitioned (across the parameter space) and distributed among m worker nodes (t of which can be maliciously corrupt), which update some coordinates of their part of the parameter vector, in parallel and iteratively, using CD updates, with the help of the master. We propose a method based on data encoding and real error correction to combat the adversary. Our method can tolerate up to ⌈m-1/2⌉ corrupt nodes, which is information-theoretically optimal. Our design gives a trade-off between the resiliency t, the required redundancy, and the computation at master and worker nodes. For example, with constant overhead in the storage and computational complexity over that required by the plain distributed CD, we can tolerate up to m/3 corrupt nodes. We design a sparse encoding scheme, which yields low encoding complexity.  more » « less
Award ID(s):
1740047
NSF-PAR ID:
10185990
Author(s) / Creator(s):
;
Date Published:
Journal Name:
IEEE International Symposium on Information Theory
Page Range / eLocation ID:
2724 to 2728
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider distributed gradient computation, where both data and computation are distributed among m worker machines, t of which can be Byzantine adversaries, and a designated (master) node computes the model/parameter vector for generalized linear models, iteratively, using proximal gradient descent (PGD), of which gradient descent (GD) is a special case. The Byzantine adversaries can (collaboratively) deviate arbitrarily from their gradient computation. To solve this, we propose a method based on data encoding and (real) error correction to combat the adversarial behavior. We can tolerate up to t <= (m−1)/2 corrupt worker nodes, which is 2 information-theoretically optimal. Our method does not assume any probability distribution on the data. We develop a sparse encoding scheme which enables computationally efficient data encoding. We demonstrate a trade-off between the number of adversaries tolerated and the resource requirement (storage and computational complexity). As an example, our scheme incurs a constant overhead (storage and computational complexity) over that required by the distributed PGD algorithm, without adversaries, for t <= m . Our encoding works as efficiently in the streaming data setting as it does in the offline setting, in which all the data is available beforehand. 
    more » « less
  2. We study stochastic gradient descent (SGD) in the master-worker architecture under Byzantine attacks. Building upon the recent advances in algorithmic high-dimensional robust statistics, in each SGD iteration, master employs a non-trivial decoding to estimate the true gradient from the unbiased stochastic gradients received from workers, some of which may be corrupt. We provide convergence analyses for both strongly-convex and non-convex smooth objectives under standard SGD assumptions. We can control the approximation error of our solution in both these settings by the mini-batch size of stochastic gradients; and we can make the approximation error as small as we want, provided that workers use a sufficiently large mini-batch size. Our algorithm can tolerate less than 1/3 fraction of Byzantine workers. It can approximately find the optimal parameters in the strongly-convex setting exponentially fast, and reaches to an approximate stationary point in the non-convex setting with linear speed, i.e., with a rate of 1/T, thus, matching the convergence rates of vanilla SGD in the Byzantine-free setting. 
    more » « less
  3. Distributed Key Generation (DKG) is a technique to bootstrap threshold cryptosystems without a trusted third party and is a building block to decentralized protocols such as randomness beacons, threshold signatures, and general multiparty computation. Until recently, DKG protocols have assumed the synchronous model and thus are vulnerable when their underlying network assumptions do not hold. The recent advancements in asynchronous DKG protocols are insufficient as they either have poor efficiency or limited functionality, resulting in a lack of concrete implementations. In this paper, we present a simple and concretely efficient asynchronous DKG (ADKG) protocol. In a network of n nodes, our ADKG protocol can tolerate up to t < n/3 malicious nodes and have an expected O(κn^3) communication cost, where κ is the security parameter. Our ADKG protocol produces a field element as the secret and is thus compatible with off-the-shelf threshold cryptosystems. We implement our ADKG protocol and evaluate it using a network of up to 128 nodes in geographically distributed AWS instances. Our evaluation shows that our protocol takes as low as 3 and 9.5 seconds to terminate for 32 and 64 nodes, respectively. Also, each node sends only 0.7 Megabytes and 2.9 Megabytes of data during the two experiments, respectively. 
    more » « less
  4. null (Ed.)
    Motivated by the increasing need to understand the distributed algorithmic foundations of large-scale graph computations, we study some fundamental graph problems in a message-passing model for distributed computing where k ≥ 2 machines jointly perform computations on graphs with n nodes (typically, n >> k). The input graph is assumed to be initially randomly partitioned among the k machines, a common implementation in many real-world systems. Communication is point-to-point, and the goal is to minimize the number of communication rounds of the computation. Our main contribution is the General Lower Bound Theorem , a theorem that can be used to show non-trivial lower bounds on the round complexity of distributed large-scale data computations. This result is established via an information-theoretic approach that relates the round complexity to the minimal amount of information required by machines to solve the problem. Our approach is generic, and this theorem can be used in a “cookbook” fashion to show distributed lower bounds for several problems, including non-graph problems. We present two applications by showing (almost) tight lower bounds on the round complexity of two fundamental graph problems, namely, PageRank computation and triangle enumeration . These applications show that our approach can yield lower bounds for problems where the application of communication complexity techniques seems not obvious or gives weak bounds, including and especially under a stochastic partition of the input. We then present distributed algorithms for PageRank and triangle enumeration with a round complexity that (almost) matches the respective lower bounds; these algorithms exhibit a round complexity that scales superlinearly in k , improving significantly over previous results [Klauck et al., SODA 2015]. Specifically, we show the following results: PageRank: We show a lower bound of Ὼ(n/k 2 ) rounds and present a distributed algorithm that computes an approximation of the PageRank of all the nodes of a graph in Õ(n/k 2 ) rounds. Triangle enumeration: We show that there exist graphs with m edges where any distributed algorithm requires Ὼ(m/k 5/3 ) rounds. This result also implies the first non-trivial lower bound of Ὼ(n 1/3 ) rounds for the congested clique model, which is tight up to logarithmic factors. We then present a distributed algorithm that enumerates all the triangles of a graph in Õ(m/k 5/3 + n/k 4/3 ) rounds. 
    more » « less
  5. We study distributed stochastic gradient descent (SGD) in the master-worker architecture under Byzantine attacks. We consider the heterogeneous data model, where different workers may have different local datasets, and we do not make any probabilistic assumptions on data generation. At the core of our algorithm, we use the polynomial-time outlier-filtering procedure for robust mean estimation proposed by Steinhardt et al. (ITCS 2018) to filter-out corrupt gradients. In order to be able to apply their filtering procedure in our heterogeneous data setting where workers compute stochastic gradients, we derive a new matrix concentration result, which may be of independent interest. We provide convergence analyses for smooth strongly-convex and non-convex objectives and show that our convergence rates match that of vanilla SGD in the Byzantine-free setting. In order to bound the heterogeneity, we assume that the gradients at different workers have bounded deviation from each other, and we also provide concrete bounds on this deviation in the statistical heterogeneous data model. 
    more » « less