skip to main content

Title: Stable and Efficient Shapley Value-Based Reward Reallocation for Multi-Agent Reinforcement Learning of Autonomous Vehicles
With the development of sensing and communica- tion technologies in networked cyber-physical systems (CPSs), multi-agent reinforcement learning (MARL)-based methodolo- gies are integrated into the control process of physical systems and demonstrate prominent performance in a wide array of CPS domains, such as connected autonomous vehicles (CAVs). However, it remains challenging to mathematically characterize the improvement of the performance of CAVs with commu- nication and cooperation capability. When each individual autonomous vehicle is originally self-interest, we can not assume that all agents would cooperate naturally during the training process. In this work, we propose to reallocate the system’s total reward efficiently to motivate stable cooperation among autonomous vehicles. We formally define and quantify how to reallocate the system’s total reward to each agent under the proposed transferable utility game, such that communication- based cooperation among multi-agents increases the system’s total reward. We prove that Shapley value-based reward reallocation of MARL locates in the core if the transferable utility game is a convex game. Hence, the cooperation is stable and efficient and the agents should stay in the coalition or the cooperating group. We then propose a cooperative policy learning algorithm with Shapley value reward reallocation. In experiments, compared with several literature algorithms, we show the improvement of the mean episode system reward of CAV systems using our proposed algorithm.  more » « less
Award ID(s):
2047354 1849246
Author(s) / Creator(s):
Date Published:
Journal Name:
2022 IEEE International Conference on Robotics and Automation
Page Range / eLocation ID:
8765 - 8771
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents’ policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate different solution concepts of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on 
    more » « less
  2. Multi-agent autonomous racing is a challenging problem for autonomous vehicles due to the split-second, and complex decisions that vehicles must continuously make during a race. The presence of other agents on the track requires continuous monitoring of the ego vehicle’s surroundings, and necessitates predicting the behavior of other vehicles so the ego can quickly react to a changing environment with informed decisions. In our previous work we have developed the DeepRacing AI framework for autonomous formula one racing. Our DeepRacing framework was the first implementation to use the highly photorealisitc Formula One game as a simulation testbed for autonomous racing. We have successfully demonstrated single agent high speed autonomous racing using Bezier curve trajectories. In this paper, we extend the capabilities of the DeepRacing framework towards multi-agent autonomous racing. To do so, we first develop and learn a virtual camera model from game data that the user can configure to emulate the presence of a camera sensor on the vehicle. Next we propose and train a deep recurrent neural network that can predict the future poses of opponent agents in the field of view of the virtual camera using vehicles position, velocity, and heading data with respect to the ego vehicle racecar. We demonstrate early promising results for both these contributions in the game. These added features will extend the DeepRacing framework to become more suitable for multi-agent autonomous racing algorithm development 
    more » « less
  3. In offline multi-agent reinforcement learning (MARL), agents estimate policies from a given dataset. We study reward-poisoning attacks in this setting where an exogenous attacker modifies the rewards in the dataset before the agents see the dataset. The attacker wants to guide each agent into a nefarious target policy while minimizing the Lp norm of the reward modification. Unlike attacks on single-agent RL, we show that the attacker can install the target policy as a Markov Perfect Dominant Strategy Equilibrium (MPDSE), which rational agents are guaranteed to follow. This attack can be significantly cheaper than separate single-agent attacks. We show that the attack works on various MARL agents including uncertainty-aware learners, and we exhibit linear programs to efficiently solve the attack problem. We also study the relationship between the structure of the datasets and the minimal attack cost. Our work paves the way for studying defense in offline MARL. 
    more » « less
  4. Connected Autonomous Vehicles (CAVs) are expected to enable reliable and efficient transportation systems. Most motion planning algorithms for multi-agent systems are not completely safe because they implicitly assume that all vehicles/agents will execute the expected plan with a small error. This assumption, however, is hard to keep for CAVs since they may have to slow down (e.g., to yield to a jaywalker) or are forced to stop (e.g. break down), sometimes even without a notice. Responsibility-Sensitive Safety (RSS) defines a set of safety rules for each driving scenario to ensure that a vehicle will not cause an accident irrespective of other vehicles' behavior. RSS rules, however, are hard to evaluate for merge, intersection, and unstructured road scenarios. In addition, deadlock situations can happen that are not considered by the RSS. In this paper, we propose a generic version of RSS rules for CAVs that can be applied to any driving scenario. We integrate the proposed RSS rules with the CAV's motion planning algorithm to enable cooperative driving of CAVs. Our approach can also detect and resolve deadlocks in a decentralized manner. We have conducted experiments to verify that a CAV does not cause an accident no matter when other CAVs slow down or stop. We also showcase our deadlock detection and resolution mechanism. Finally, we compare the average velocity and fuel consumption of vehicles when they drive autonomously but not connected with the case that they are connected. 
    more » « less
  5. Andreas Krause, Emma Brunskill (Ed.)
    Executing actions in a correlated manner is a common strategy for human coordination that often leads to better cooperation, which is also potentially beneficial for cooperative multi-agent reinforcement learning (MARL). However, the recent success of MARL relies heavily on the convenient paradigm of purely decentralized execution, where there is no action correlation among agents for scalability considerations. In this work, we introduce a Bayesian network to inaugurate correlations between agents’ action selections in their joint policy. Theoretically, we establish a theoretical justification for why action dependencies are beneficial by deriving the multi-agent policy gradient formula under such a Bayesian network joint policy and proving its global convergence to Nash equilibria under tabular softmax policy parameterization in cooperative Markov games. Further, by equipping existing MARL algorithms with a recent method of differentiable directed acyclic graphs (DAGs), we develop practical algorithms to learn the context-aware Bayesian network policies in scenarios with partial observability and various difficulty. We also dynamically decrease the sparsity of the learned DAG throughout the training process, which leads to weakly or even purely independent policies for decentralized execution. Empirical results on a range of MARL benchmarks show the benefits of our approach. 
    more » « less