We study distributed coordinate descent (CD) in the masterworker architecture under adversarial attacks, where the data is partitioned (across the parameter space) and distributed among m worker nodes (t of which can be maliciously corrupt), which update some coordinates of their part of the parameter vector, in parallel and iteratively, using CD updates, with the help of the master. We propose a method based on data encoding and real error correction to combat the adversary. Our method can tolerate up to âŒˆm1/2âŒ‰ corrupt nodes, which is informationtheoretically optimal. Our design gives a tradeoff between the resiliency t, the required redundancy, and the computation at master and worker nodes. For example, with constant overhead in the storage and computational complexity over that required by the plain distributed CD, we can tolerate up to m/3 corrupt nodes. We design a sparse encoding scheme, which yields low encoding complexity.
more »
« less
Data Encoding Methods for ByzantineResilient Distributed Optimization
We consider distributed gradient computation, where both
data and computation are distributed among m worker machines, t of
which can be Byzantine adversaries, and a designated (master) node
computes the model/parameter vector for generalized linear models,
iteratively, using proximal gradient descent (PGD), of which gradient
descent (GD) is a special case. The Byzantine adversaries can
(collaboratively) deviate arbitrarily from their gradient
computation. To solve this, we propose a method based on data encoding
and (real) error correction to combat the adversarial behavior. We can
tolerate up to t <= (mâˆ’1)/2 corrupt worker nodes, which is 2
informationtheoretically optimal. Our method does not assume any
probability distribution on the data. We develop a sparse encoding
scheme which enables computationally efficient data encoding. We
demonstrate a tradeoff between the number of adversaries tolerated
and the resource requirement (storage and computational
complexity). As an example, our scheme incurs a constant overhead
(storage and computational complexity) over that required by the
distributed PGD algorithm, without adversaries, for t <= m . Our
encoding works as efficiently in the streaming data setting as it does
in the offline setting, in which all the data is available beforehand.
more »
« less
 Award ID(s):
 1740047
 NSFPAR ID:
 10185989
 Date Published:
 Journal Name:
 IEEE International Symposium on Information Theory (ISIT)
 Page Range / eLocation ID:
 2719 to 2723
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this


We consider the distributed statistical learning problem in a highdimensional adversarial scenario. At each iteration, $m$ worker machines compute stochastic gradients and send them to a master machine. However, an $\alpha$fraction of $m$ worker machines, called Byzantine machines, may act adversarially and send faulty gradients. To guard against faulty information sharing, we develop a distributed robust learning algorithm based on Nesterov's dual averaging. This algorithms is provably robust against Byzantine machines whenever $\alpha\in[0, 1/2)$. For smooth convex functions, we show that running the proposed algorithm for $T$ iterations achieves a statistical error bound $\tilde{O}\big(1/\sqrt{mT}+\alpha/\sqrt{T}\big)$. This result holds for a large class of normed spaces and it matches the known statistical error bound for Byzantine stochastic gradient in the Euclidean space setting. A key feature of the algorithm is that the dimension dependence of the bound scales with the dual norm of the gradient; in particular, for probability simplex, we show that it depends logarithmically on the problem dimension $d$. Such a weak dependence on the dimension is desirable in highdimensional statistical learning and it has been known to hold for the classical mirror descent but it appears to be new for the Byzantine gradient scenario.more » « less

We study distributed stochastic gradient descent (SGD) in the masterworker architecture under Byzantine attacks. We consider the heterogeneous data model, where different workers may have different local datasets, and we do not make any probabilistic assumptions on data generation. At the core of our algorithm, we use the polynomialtime outlierfiltering procedure for robust mean estimation proposed by Steinhardt et al. (ITCS 2018) to filterout corrupt gradients. In order to be able to apply their filtering procedure in our heterogeneous data setting where workers compute stochastic gradients, we derive a new matrix concentration result, which may be of independent interest. We provide convergence analyses for smooth stronglyconvex and nonconvex objectives and show that our convergence rates match that of vanilla SGD in the Byzantinefree setting. In order to bound the heterogeneity, we assume that the gradients at different workers have bounded deviation from each other, and we also provide concrete bounds on this deviation in the statistical heterogeneous data model.more » « less

We study distributed stochastic gradient descent (SGD) in the masterworker architecture under Byzantine at tacks. We consider the heterogeneous data model, where different workers may have different local datasets, and we do not make any probabilistic assumptions on data generation. At the core of our algorithm, we use the polynomialtime outlierfiltering procedure for robust mean estimation proposed by Steinhardt et al. (ITCS 2018) to filterout corrupt gradients. In order to be able to apply their filtering procedure in our heterogeneous data setting where workers compute stochastic gradients, we derive a new matrix concentration result, which may be of independent interest. We provide convergence analyses for smooth strongly convex and nonconvex objectives and show that our convergence rates match that of vanilla SGD in the Byzantinefree setting. In order to bound the heterogeneity, we assume that the gradients at different workers have bounded deviation from each other, and we also provide concrete bounds on this deviation in the statistical heterogeneous data model.more » « less

We study stochastic gradient descent (SGD) in the masterworker architecture under Byzantine attacks. Building upon the recent advances in algorithmic highdimensional robust statistics, in each SGD iteration, master employs a nontrivial decoding to estimate the true gradient from the unbiased stochastic gradients received from workers, some of which may be corrupt. We provide convergence analyses for both stronglyconvex and nonconvex smooth objectives under standard SGD assumptions. We can control the approximation error of our solution in both these settings by the minibatch size of stochastic gradients; and we can make the approximation error as small as we want, provided that workers use a sufficiently large minibatch size. Our algorithm can tolerate less than 1/3 fraction of Byzantine workers. It can approximately find the optimal parameters in the stronglyconvex setting exponentially fast, and reaches to an approximate stationary point in the nonconvex setting with linear speed, i.e., with a rate of 1/T, thus, matching the convergence rates of vanilla SGD in the Byzantinefree setting.more » « less