We consider the problem of multiagent optimization wherein an unknown subset of agents suffer Byzantine faults and thus behave adversarially. We assume that each agent i has a local cost function fi , and the overarching goal of the good agents is to collaboratively minimize a global objective that properly aggregates these local cost functions. To the best of our knowledge, we are among the first to study Byzantineresilient optimization where no central coordinating agent exists, and we are the first to characterize the structures of the convex coefficients of the achievable global objectives. Dealing with Byzantine faults is very challenging. For example, in contrast to faultfree networks, reaching Byzantineresilient agreement even in the simplest setting is far from trivial. We take a step toward solving the proposed Byzantineresilient multiagent optimization problem by focusing on scalar local cost functions. Our results might provide useful insights for the general local cost functions.
Finitetime Guarantees for Byzantineresilient Distributed State Estimation with Noisy Measurements
This article considers resilient cooperative state estimation in unreliable multiagent networks. A network of agents aim to collaboratively estimate the value of an unknown vector parameter, while an unknown subset of agents suffer Byzantine faults. We refer to the faulty agents as Byzantine agents. Byzantine agents malfunction arbitrarily and may send out highly unstructured messages to other agents in the network. As opposed to faultfree networks, reaching agreement in the presence of Byzantine agents is far from trivial. In this article, we propose a computationally efficient algorithm that is provably robust to Byzantine agents. At each iteration of the algorithm, a good agent performs a gradient descent update based on noisy local measurements, exchanges its update with other agents in its neighborhood, and robustly aggregates the received messages using coordinatewise trimmed means. Under mild technical assumptions, we establish that good agents learn the true parameter asymptotically in almost sure sense. We further complement our analysis by proving (high probability) finitetime convergence rate, encapsulating network characteristics.
 Award ID(s):
 2003830
 Publication Date:
 NSFPAR ID:
 10250253
 Journal Name:
 IEEE transactions on automatic control
 ISSN:
 23343303
 Sponsoring Org:
 National Science Foundation
More Like this


This paper considers the Byzantine consensus problem for nodes with binary inputs. The nodes are interconnected by a network represented as an undirected graph, and the system is assumed to be synchronous. Under the classical pointtopoint communication model, it is wellknown that the following two conditions are both necessary and sufficient to achieve Byzantine consensus among n nodes in the presence of up to ƒ Byzantine faulty nodes: n & 3 #8805; 3 ≥ ƒ+ 1 and vertex connectivity at least 2 ƒ + 1. In the classical pointtopoint communication model, it is possible for a faulty node to equivocate, i.e., transmit conflicting information to different neighbors. Such equivocation is possible because messages sent by a node to one of its neighbors are not overheard by other neighbors. This paper considers the local broadcast model. In contrast to the pointtopoint communication model, in the local broadcast model, messages sent by a node are received identically by all of its neighbors. Thus, under the local broadcast model, attempts by a node to send conflicting information can be detected by its neighbors. Under this model, we show that the following two conditions are both necessary and sufficient for Byzantine consensus: vertex connectivitymore »

Largescale distributed deep learning training has enabled developments of more complex deep neural network models to learn from larger datasets for sophisticated tasks. In particular, distributed stochastic gradient descent intensively invokes allreduce operations for gradient update, which dominates communication time during iterative training epochs. In this work, we identify the inefficiency in widely used allreduce algorithms, and the opportunity of algorithmarchitecture codesign. We propose MultiTree allreduce algorithm with topology and resource utilization awareness for efficient and scalable allreduce operations, which is applicable to different interconnect topologies. Moreover, we codesign the network interface to schedule and coordinate the allreduce messages for contentionfree communications, working in synergy with the algorithm. The flow control is also simplified to exploit the bulk data transfer of big gradient exchange. We evaluate the codesign using different allreduce data sizes for synthetic study, demonstrating its effectiveness on various interconnection network topologies, in addition to stateoftheart deep neural networks for real workload experiments. The results show that MultiTree achieves 2.3× and 1.56× communication speedup, as well as up to 81% and 30% training time reduction compared to ring allreduce and stateoftheart approaches, respectively.

The widelystudied radio network model [Chlamtac and Kutten, 1985] is a graphbased description that captures the inherent impact of collisions in wireless communication. In this model, the strong assumption is made that node v receives a message from a neighbor if and only if exactly one of its neighbors broadcasts. We relax this assumption by introducing a new noisy radio network model in which random faults occur at senders or receivers. Specifically, for a constant noise parameter p ∈ [0,1), either every sender has probability p of transmitting noise or every receiver of a single transmission in its neighborhood has probability p of receiving noise. We first study singlemessage broadcast algorithms in noisy radio networks and show that the Decay algorithm [BarYehuda et al., 1992] remains robust in the noisy model while the diameterlinear algorithm of Gasieniec et al., 2007 does not. We give a modified version of the algorithm of Gasieniec et al., 2007 that is robust to sender and receiver faults, and extend both this modified algorithm and the Decay algorithm to robust multimessage broadcast algorithms, broadcasting Ω(1/log n log log n) and Ω(1/log n) messages per round, respectively. We next investigate the extent to which (network) codingmore »

This paper considers a planar multiagent coordination problem. Unlike other related works, we explicitly consider a globally shared wireless communication channel where individual agents must choose both a frequency and power to transmit their messages at. This problem is motivated by the pressing need for algorithms that are able to efficiently and reliably operate on overcrowded wireless networks or otherwise poorperforming RF environments. We develop a selftriggered coordination algorithm that guarantees convergence to the desired set of states with probability 1. The algorithm is developed by using ideas from event/selftriggered coordination and allows agents to autonomously decide for themselves when to broadcast information, at which frequency and power, and how to move based on information received from other agents in the network. Simulations illustrate our results.