skip to main content

This content will become publicly available on January 6, 2025

Title: TIM-MARL: Information Sharing for Multi-Agent Reinforcement Learning in Smart Environments
Information sharing among agents to jointly solve problems is challenging for multi-agent reinforcement learning algorithms (MARL) in smart environments. In this paper, we present a novel information sharing approach for MARL, which introduces a Team Information Matrix (TIM) that integrates scenario-independent spatial and environmental information combined with the agent's local observations, augmenting both individual agent's performance and global awareness during the MARL learning. To evaluate this approach, we conducted experiments on three multi-agent scenarios of varying difficulty levels implemented in Unity ML-Agents Toolkit. Experimental results show that the agents utilizing our TIM-Shared variation outperformed those using decentralized MARL and achieved comparable performance to agents employing centralized MARL.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Date Published:
Page Range / eLocation ID:
1044 to 1045
Subject(s) / Keyword(s):
Navigation Information sharing Collaboration Reinforcement learning Task analysis Robots
Medium: X
Las Vegas, NV, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Multi-Agent Reinforcement Learning (MARL) is a key technology in artificial intelligence applications such as robotics, surveillance, energy systems, etc. Multi-Agent Deep Deterministic Policy Gradient (MADDPG) is a state-of-the-art MARL algorithm that has been widely adopted and considered a popular baseline for novel MARL algorithms. However, existing implementations of MADDPG on CPU and CPU-GPU platforms do not exploit fine-grained parallelism between cooperative agents and handle inter-agent communication sequentially, leading to sub-optimal throughput performance in MADDPG training. In this work, we develop the first high-throughput MADDPG accelerator on a CPU-FPGA heterogeneous platform. Specifically, we develop dedicated hardware modules that enable parallel training of each agent's internal Deep Neural Networks (DNNs) and support low-latency inter-agent communication using an on-chip agent interconnection network. Our experimental results show that the speed performance of agent neural network training improves by a factor of 3.6×−24.3× and 1.5×−29.5× compared with state-of-the-art CPU and CPU-GPU implementations. Our design achieves up to a 1.99× and 1.93× improvement in overall system throughput compared with CPU and CPU-GPU implementations, respectively. 
    more » « less
  2. In cooperative multi-agent reinforcement learning (Co-MARL), a team of agents must jointly optimize the team's longterm rewards to learn a designated task. Optimizing rewards as a team often requires inter-agent communication and data sharing, leading to potential privacy implications. We assume privacy considerations prohibit the agents from sharing their environment interaction data. Accordingly, we propose Privacy-Engineered Value Decomposition Networks (PE-VDN), a Co-MARL algorithm that models multi-agent coordination while provably safeguarding the confidentiality of the agents' environment interaction data. We integrate three privacy-engineering techniques to redesign the data flows of the VDN algorithm-an existing Co-MARL algorithm that consolidates the agents' environment interaction data to train a central controller that models multi-agent coordination-and develop PE-VDN. In the first technique, we design a distributed computation scheme that eliminates Vanilla VDN's dependency on sharing environment interaction data. Then, we utilize a privacy-preserving multi-party computation protocol to guar-antee that the data flows of the distributed computation scheme do not pose new privacy risks. Finally, we enforce differential privacy to preempt inference threats against the agents' training data-past environment interactions-when they take actions based on their neural network predictions. We implement PE-VDN in StarCraft Multi-Agent Competition (SMAC) and show that it achieves 80% of Vanilla VDN's win rate while maintaining differential privacy levels that provide meaningful privacy guarantees. The results demonstrate that PE-VDN can safeguard the confidentiality of agents' environment interaction data without sacrificing multi-agent coordination. 
    more » « less
  3. Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents’ policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate different solution concepts of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on 
    more » « less
  4. Learning safe solutions is an important but challenging problem in multi-agent reinforcement learning (MARL). Shielded reinforcement learning is one approach for preventing agents from choosing unsafe actions. Current shielded reinforcement learning methods for MARL make strong assumptions about communication and full observability. In this work, we extend the formalization of the shielded reinforcement learning problem to a decentralized multi-agent setting. We then present an algorithm for decomposition of a centralized shield, allowing shields to be used in such decentralized, communication-free environments. Our results show that agents equipped with decentralized shields perform comparably to agents with centralized shields in several tasks, allowing shielding to be used in environments with decentralized training and execution for the first time. 
    more » « less
  5. In offline multi-agent reinforcement learning (MARL), agents estimate policies from a given dataset. We study reward-poisoning attacks in this setting where an exogenous attacker modifies the rewards in the dataset before the agents see the dataset. The attacker wants to guide each agent into a nefarious target policy while minimizing the Lp norm of the reward modification. Unlike attacks on single-agent RL, we show that the attacker can install the target policy as a Markov Perfect Dominant Strategy Equilibrium (MPDSE), which rational agents are guaranteed to follow. This attack can be significantly cheaper than separate single-agent attacks. We show that the attack works on various MARL agents including uncertainty-aware learners, and we exhibit linear programs to efficiently solve the attack problem. We also study the relationship between the structure of the datasets and the minimal attack cost. Our work paves the way for studying defense in offline MARL. 
    more » « less