To improve the resilience of distributed training to worst-case, or Byzantine node failures, several recent approaches have replaced gradient averaging with robust aggregation methods. Such techniques can have high computational costs, often quadratic in the number of compute nodes, and only have limited robustness guarantees. Other methods have instead used redundancy to guarantee robustness, but can only tolerate limited number of Byzantine failures. In this work, we present DETOX, a Byzantine-resilient distributed training framework that combines algorithmic redundancy with robust aggregation. DETOX operates in two steps, a filtering step that uses limited redundancy to significantly reduce the effect of Byzantine nodes, and a hierarchical aggregation step that can be used in tandem with any state-of-the-art robust aggregation method. We show theoretically that this leads to a substantial increase in robustness, and has a per iteration runtime that can be nearly linear in the number of compute nodes. We provide extensive experiments over real distributed setups across a variety of large-scale machine learning tasks, showing that DETOX leads to orders of magnitude accuracy and speedup improvements over many state-of-the-art Byzantine-resilient approaches.
more »
« less
Asymptotic Network Independence and Step-Size for a Distributed Subgradient Method
We consider whether distributed subgradient methods can achieve a linear speedup over a centralized subgradient method. While it might be hoped that distributed network of n nodes that can compute n times more subgradients in parallel compared to a single node might, as a result, be n times faster, existing bounds for distributed optimization methods are often consistent with a slowdown rather than speedup compared to a single node. We show that a distributed subgradient method has this “linear speedup” property when using a class of square-summable-but-not-summable step-sizes which include 1/t^β when β ∈ (1/2,1); for such step-sizes, we show that after a transient period whose size depends on the spectral gap of the network, the method achieves a performance guarantee that does not depend on the network or the number of nodes. We also show that the same method can fail to have this “asymptotic network independence” property under the optimally decaying step-size 1/t^{1/2} and, as a consequence, can fail to provide a linear speedup compared to a single node with 1/t^{1/2} step-size.
more »
« less
- PAR ID:
- 10349575
- Date Published:
- Journal Name:
- Journal of machine learning research
- Volume:
- 23
- Issue:
- 69
- ISSN:
- 1532-4435
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper introduces DISPERSE, a distributed scalable architecture for delivery of content and services that provides resilience against node failure through location-independent storage and replication of content. Current content delivery networks (CDNs) have, at least to some degree, a centralized structure thus susceptible to a single point of failure. DISPERSE addresses this limitation by implementing a fully de-centralized structure. DISPERSE is a two-layer architecture: the first layer (front-end layer) exposes services (e.g., Web, SFTP) to clients; the second layer (back-end layer) provides reliable distributed storage of content and application state. Content in DISPERSE's back-end layer is stored and exchanged as Named Data Network (NDN) content objects. This allows DISPERSE to implement fine-grained, location-independent, fully decentralized content replication mechanisms. We validate the performance of DISPERSE under two node failure scenarios. In the first scenario, content can be stored in any DISPERSE node, and all nodes are equally likely to fail. In this scenario, we use non-linear optimization techniques to determine the optimal number of content copies under availability and latency constraints. In the second scenario, different nodes fail with different probabilities, and content is stored in nodes according to its value, node failure probability, and resource availability. This scenario is addressed as an instance of the minimum cost flow problem. Our results show that DISPERSE reduces the failure of content retrieval by five orders of magnitude compared to common CDN implementations, without significantly increasing content retrieval delay. Further, numerical results show that DISPERSE improves content availability by a factor of 1.3x-2.3x when deploying the minimum cost flow algorithm.more » « less
-
Abstract We present analytical results for the distribution of first return (FR) times of non-backtracking random walks (NBWs) on undirected configuration model networks consisting of $$N$$ nodes with degree distribution $P(k)$. We focus on the case in which the network consists of a single connected component. Starting from a random initial node $$i$$ at time $t=0$, an NBW hops into a random neighbor of $$i$$ at time $t=1$ and at each subsequent step it continues to hop into a random neighbor of its current node, excluding the previous node. We calculate the tail distribution $$P ( T_{\rm FR} > t )$$ of first return times from a random initial node to itself. It is found that $$P ( T_{\rm FR} > t )$$ is given by a discrete Laplace transform of the degree distribution $P(k)$. This result exemplifies the relation between structural properties of a network, captured by the degree distribution, and properties of dynamical processes taking place on the network. Using the tail-sum formula, we calculate the mean first return time $${\mathbb E}[ T_{\rm FR} ]$$. Surprisingly, $${\mathbb E}[ T_{\rm FR} ]$$ coincides with the result obtained from Kac's lemma that applies to simple random walks (RWs). We also calculate the variance $${\rm Var}(T_{\rm FR})$$, which accounts for the variability of first return times between different NBW trajectories. We apply this formalism to Erd{\H o}s-R\'enyi networks, random regular graphs and configuration model networks with exponential and power-law degree distributions and obtain closed-form expressions for $$P( T_{\rm FR} > t )$$ as well as its mean and variance. These results provide useful insight on the advantages of NBWs over simple RWs in network exploration, sampling and search processes.more » « less
-
We consider the problem of finding the maximally influential node in random networks where each node influences every other node with constant yet unknown probability. We develop an online algorithm that learns the relative influences of the nodes. It relaxes the assumption in the existing literature that a central observer can monitor the influence spread globally. The proposed algorithm delegates the online updates to the nodes on the network; hence requires only local observations at the nodes. We show that using an explore-then-commit learning strategy, the cumulative regret accumulated by the algorithm over horizon T approaches O(T2/3) for a network with a large number of nodes. Additionally, we show that, for fixed T, the worst case-regret grows linearly with the number n of nodes in the graph. Numerical experiments illustrate this linear dependence for Chung-Lu models. The experiments also demonstrate that ε-greedy learning strategies can achieve similar performance to the explore-then-commit strategy on Chung-Lu models.more » « less
-
Distributed transmit beamforming (DTBF) can allow a swarm of unmanned aerial vehicles (UAVs) to send a common message to a distant target. DTBF among N nodes can provide N 2 times the received power compared to a single node and can reduce interference by confining the signal in a certain direction. However, DTBF requires time, frequency, and phase synchronization. Here, we focus on the issue of phase incoherence at the distributed transmit nodes from two sources—different local oscillators (LOs) and hovering position movement—and how to counteract their impact at the receiver via local decisions, namely, rotation. To investigate how the UAV body and its rotation can affect phase coherency, we conduct controlled in-field experiments where we control the phase offset at two distributed antennas and measure the received signal level at four antenna positions on a drone for various rotation angles. We show that significant improvements can be achieved at the receiver through rotation. We also show that there exists an optimal combination of UAV rotation angle and antenna position on the drone to mitigate the effects of phase incoherence among the distributed transmitters. Finally, we demonstrate an interesting trade-off where, due to the heterogeneous nature of the UAV body, rotation angles that yield maximum beamforming gains might not result in the best average (or minimum) beamformed signal level across all possible phase errors at the distributed transmitters.more » « less
An official website of the United States government

