In offline multi-agent reinforcement learning (MARL), agents estimate policies from a given dataset. We study reward-poisoning attacks in this setting where an exogenous attacker modifies the rewards in the dataset before the agents see the dataset. The attacker wants to guide each agent into a nefarious target policy while minimizing the Lp norm of the reward modification. Unlike attacks on single-agent RL, we show that the attacker can install the target policy as a Markov Perfect Dominant Strategy Equilibrium (MPDSE), which rational agents are guaranteed to follow. This attack can be significantly cheaper than separate single-agent attacks. We show that the attack works on various MARL agents including uncertainty-aware learners, and we exhibit linear programs to efficiently solve the attack problem. We also study the relationship between the structure of the datasets and the minimal attack cost. Our work paves the way for studying defense in offline MARL.
more »
« less
Robust Multiagent Reinforcement Learning for UAV Systems: Countering Byzantine Attacks
Multiple unmanned aerial vehicle (multi-UAV) systems have gained significant attention in applications, such as aerial surveillance and search and rescue missions. With the recent development of state-of-the-art multiagent reinforcement learning (MARL) algorithms, it is possible to train multi-UAV systems in collaborative and competitive environments. However, the inherent vulnerabilities of multiagent systems pose significant privacy and security risks when deploying general and conventional MARL algorithms. The presence of even a single Byzantine adversary within the system can severely degrade the learning performance of UAV agents. This work proposes a Byzantine-resilient MARL algorithm that leverages a combination of geometric median consensus and a robust state update model to mitigate, or even eliminate, the influence of Byzantine attacks. To validate its effectiveness and feasibility, the authors include a multi-UAV threat model, provide a guarantee of robustness, and investigate key attack parameters for multiple UAV navigation scenarios. Results from the experiments show that the average rewards during a Byzantine attack increased by up to 60% for the cooperative navigation scenario compared with conventional MARL techniques. The learning rewards generated by the baseline algorithms could not converge during training under these attacks, while the proposed method effectively converged to an optimal solution, proving its viability and correctness.
more »
« less
- Award ID(s):
- 2447364
- PAR ID:
- 10582112
- Publisher / Repository:
- MDPI
- Date Published:
- Journal Name:
- Information
- Volume:
- 14
- Issue:
- 11
- ISSN:
- 2078-2489
- Page Range / eLocation ID:
- 623
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Federated learning—multi-party, distributed learning in a decentralized environment—is vulnerable to model poisoning attacks, more so than centralized learning. This is because malicious clients can collude and send in carefully tailored model updates to make the global model inaccurate. This motivated the development of Byzantine-resilient federated learning algorithms, such as Krum, Bulyan, FABA, and FoolsGold. However, a recently developed untargeted model poisoning attack showed that all prior defenses can be bypassed. The attack uses the intuition that simply by changing the sign of the gradient updates that the optimizer is computing, for a set of malicious clients, a model can be diverted from the optima to increase the test error rate. In this work, we develop FLAIR—a defense against this directed deviation attack (DDA), a state-of-the-art model poisoning attack. FLAIR is based on ourintuition that in federated learning, certain patterns of gradient flips are indicative of an attack. This intuition is remarkably stable across different learning algorithms, models, and datasets. FLAIR assigns reputation scores to the participating clients based on their behavior during the training phase and then takes a weighted contribution of the clients. We show that where the existing defense baselines of FABA [IJCAI’19], FoolsGold [Usenix ’20], and FLTrust [NDSS ’21] fail when 20-30% of the clients are malicious, FLAIR provides byzantine-robustness upto a malicious client percentage of 45%. We also show that FLAIR provides robustness against even a white-box version of DDA.more » « less
-
Recent works have demonstrated the vulnerability of Deep Reinforcement Learning (DRL) algorithms against training-time, backdoor poisoning attacks. The objectives of these attacks are twofold: induce pre-determined, adversarial behavior in the agent upon observing a fixed trigger during deployment while allowing the agent to solve its intended task during training. Prior attacks assume arbitrary control over the agent's rewards, inducing values far outside the environment's natural constraints. This results in brittle attacks that fail once the proper reward constraints are enforced. Thus, in this work we propose a new class of backdoor attacks against DRL which are the first to achieve state of the art performance under strict reward constraints. These ``inception'' attacks manipulate the agent's training data -- inserting the trigger into prior observations and replacing high return actions with those of the targeted adversarial behavior. We formally define these attacks and prove they achieve both adversarial objectives against arbitrary Markov Decision Processes (MDP). Using this framework we devise an online inception attack which achieves an 100% attack success rate on multiple environments under constrained rewards while minimally impacting the agent's task performance.more » « less
-
Unmanned aerial vehicle (UAV) technology is a rapidly growing field with tremendous opportunities for research and applications. To achieve true autonomy for UAVs in the absence of remote control, external navigation aids like global navigation satellite systems and radar systems, a minimum energy trajectory planning that considers obstacle avoidance and stability control will be the key. Although this can be formulated as a constrained optimization problem, due to the complicated non-linear relationships between UAV trajectory and thrust control, it is almost impossible to be solved analytically. While deep reinforcement learning is known for its ability to provide model free optimization for complex system through learning, its state space, actions and reward functions must be designed carefully. This paper presents our vision of different layers of autonomy in a UAV system, and our effort in generating and tracking the trajectory both using deep reinforcement learning (DRL). The experimental results show that compared to conventional approaches, the learned trajectory will need 20% less control thrust and 18% less time to reach the target. Furthermore, using the control policy learning by DRL, the UAV will achieve 58.14% less position error and 21.77% less system power.more » « less
-
We characterize offline data poisoning attacks on Multi-Agent Reinforcement Learning (MARL), where an attacker may change a data set in an attempt to install a (potentially fictitious) unique Markov-perfect Nash equilibrium for a two-player zero-sum Markov game. We propose the unique Nash set, namely the set of games, specified by their Q functions, with a specific joint policy being the unique Nash equilibrium. The unique Nash set is central to poisoning attacks because the attack is successful if and only if data poisoning pushes all plausible games inside it. The unique Nash set generalizes the reward polytope commonly used in inverse reinforcement learning to MARL. For zero-sum Markov games, both the inverse Nash set and the set of plausible games induced by data are polytopes in the Q function space. We exhibit a linear program to efficiently compute the optimal poisoning attack. Our work sheds light on the structure of data poisoning attacks on offline MARL, a necessary step before one can design more robust MARL algorithms.more » « less
An official website of the United States government

