We consider the problem of multiagent optimization wherein an unknown subset of agents suffer Byzantine faults and thus behave adversarially. We assume that each agent i has a local cost function fi , and the overarching goal of the good agents is to collaboratively minimize a global objective that properly aggregates these local cost functions. To the best of our knowledge, we are among the first to study Byzantine-resilient optimization where no central coordinating agent exists, and we are the first to characterize the structures of the convex coefficients of the achievable global objectives. Dealing with Byzantine faults is very challenging. For example, in contrast to fault-free networks, reaching Byzantine-resilient agreement even in the simplest setting is far from trivial. We take a step toward solving the proposed Byzantine-resilient multiagent optimization problem by focusing on scalar local cost functions. Our results might provide useful insights for the general local cost functions.
Finite-time Guarantees for Byzantine-resilient Distributed State Estimation with Noisy Measurements
This article considers resilient cooperative state estimation in unreliable multiagent networks. A network of agents aim to collaboratively estimate the value of an unknown vector parameter, while an unknown subset of agents suffer Byzantine faults. We refer to the faulty agents as Byzantine agents. Byzantine agents malfunction arbitrarily and may send out highly unstructured messages to other agents in the network. As opposed to fault-free networks, reaching agreement in the presence of Byzantine agents is far from trivial. In this article, we propose a computationally efficient algorithm that is provably robust to Byzantine agents. At each iteration of the algorithm, a good agent performs a gradient descent update based on noisy local measurements, exchanges its update with other agents in its neighborhood, and robustly aggregates the received messages using coordinate-wise trimmed means. Under mild technical assumptions, we establish that good agents learn the true parameter asymptotically in almost sure sense. We further complement our analysis by proving (high probability) finite-time convergence rate, encapsulating network characteristics.
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- IEEE transactions on automatic control
- Sponsoring Org:
- National Science Foundation
More Like this
This paper considers the Byzantine consensus problem for nodes with binary inputs. The nodes are interconnected by a network represented as an undirected graph, and the system is assumed to be synchronous. Under the classical point-to-point communication model, it is well-known that the following two conditions are both necessary and sufficient to achieve Byzantine consensus among n nodes in the presence of up to ƒ Byzantine faulty nodes: n & 3 #8805; 3 ≥ ƒ+ 1 and vertex connectivity at least 2 ƒ + 1. In the classical point-to-point communication model, it is possible for a faulty node to equivocate, i.e., transmit conflicting information to different neighbors. Such equivocation is possible because messages sent by a node to one of its neighbors are not overheard by other neighbors. This paper considers the local broadcast model. In contrast to the point-to-point communication model, in the local broadcast model, messages sent by a node are received identically by all of its neighbors. Thus, under the local broadcast model, attempts by a node to send conflicting information can be detected by its neighbors. Under this model, we show that the following two conditions are both necessary and sufficient for Byzantine consensus: vertex connectivitymore »
Large-scale distributed deep learning training has enabled developments of more complex deep neural network models to learn from larger datasets for sophisticated tasks. In particular, distributed stochastic gradient descent intensively invokes all-reduce operations for gradient update, which dominates communication time during iterative training epochs. In this work, we identify the inefficiency in widely used all-reduce algorithms, and the opportunity of algorithm-architecture co-design. We propose MultiTree all-reduce algorithm with topology and resource utilization awareness for efficient and scalable all-reduce operations, which is applicable to different interconnect topologies. Moreover, we co-design the network interface to schedule and coordinate the all-reduce messages for contention-free communications, working in synergy with the algorithm. The flow control is also simplified to exploit the bulk data transfer of big gradient exchange. We evaluate the co-design using different all-reduce data sizes for synthetic study, demonstrating its effectiveness on various interconnection network topologies, in addition to state-of-the-art deep neural networks for real workload experiments. The results show that MultiTree achieves 2.3× and 1.56× communication speedup, as well as up to 81% and 30% training time reduction compared to ring all-reduce and state-of-the-art approaches, respectively.
The widely-studied radio network model [Chlamtac and Kutten, 1985] is a graph-based description that captures the inherent impact of collisions in wireless communication. In this model, the strong assumption is made that node v receives a message from a neighbor if and only if exactly one of its neighbors broadcasts. We relax this assumption by introducing a new noisy radio network model in which random faults occur at senders or receivers. Specifically, for a constant noise parameter p ∈ [0,1), either every sender has probability p of transmitting noise or every receiver of a single transmission in its neighborhood has probability p of receiving noise. We first study single-message broadcast algorithms in noisy radio networks and show that the Decay algorithm [Bar-Yehuda et al., 1992] remains robust in the noisy model while the diameter-linear algorithm of Gasieniec et al., 2007 does not. We give a modified version of the algorithm of Gasieniec et al., 2007 that is robust to sender and receiver faults, and extend both this modified algorithm and the Decay algorithm to robust multi-message broadcast algorithms, broadcasting Ω(1/log n log log n) and Ω(1/log n) messages per round, respectively. We next investigate the extent to which (network) codingmore »
This paper considers a planar multi-agent coordination problem. Unlike other related works, we explicitly consider a globally shared wireless communication channel where individual agents must choose both a frequency and power to transmit their messages at. This problem is motivated by the pressing need for algorithms that are able to efficiently and reliably operate on overcrowded wireless networks or otherwise poor-performing RF environments. We develop a self-triggered coordination algorithm that guarantees convergence to the desired set of states with probability 1. The algorithm is developed by using ideas from event/self-triggered coordination and allows agents to autonomously decide for themselves when to broadcast information, at which frequency and power, and how to move based on information received from other agents in the network. Simulations illustrate our results.