Distributed robust statistical learning: Byzantine mirror descent

Ding, D.; Wei, X.; Jovanovic, M. R.

We consider the distributed statistical learning problem in a high-dimensional adversarial scenario. At each iteration, $$m$$ worker machines compute stochastic gradients and send them to a master machine. However, an $$\alpha$$-fraction of $$m$$ worker machines, called Byzantine machines, may act adversarially and send faulty gradients. To guard against faulty information sharing, we develop a distributed robust learning algorithm based on Nesterov's dual averaging. This algorithms is provably robust against Byzantine machines whenever $$\alpha\in[0, 1/2)$$. For smooth convex functions, we show that running the proposed algorithm for $$T$$ iterations achieves a statistical error bound $$\tilde{O}\big(1/\sqrt{mT}+\alpha/\sqrt{T}\big)$$. This result holds for a large class of normed spaces and it matches the known statistical error bound for Byzantine stochastic gradient in the Euclidean space setting. A key feature of the algorithm is that the dimension dependence of the bound scales with the dual norm of the gradient; in particular, for probability simplex, we show that it depends logarithmically on the problem dimension $$d$$. Such a weak dependence on the dimension is desirable in high-dimensional statistical learning and it has been known to hold for the classical mirror descent but it appears to be new for the Byzantine gradient scenario.

More Like this