skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, June 13 until 2:00 AM ET on Friday, June 14 due to maintenance. We apologize for the inconvenience.


Title: Stable and Efficient Shapley Value-Based Reward Reallocation for Multi-Agent Reinforcement Learning of Autonomous Vehicles
With the development of sensing and communica- tion technologies in networked cyber-physical systems (CPSs), multi-agent reinforcement learning (MARL)-based methodolo- gies are integrated into the control process of physical systems and demonstrate prominent performance in a wide array of CPS domains, such as connected autonomous vehicles (CAVs). However, it remains challenging to mathematically characterize the improvement of the performance of CAVs with commu- nication and cooperation capability. When each individual autonomous vehicle is originally self-interest, we can not assume that all agents would cooperate naturally during the training process. In this work, we propose to reallocate the system’s total reward efficiently to motivate stable cooperation among autonomous vehicles. We formally define and quantify how to reallocate the system’s total reward to each agent under the proposed transferable utility game, such that communication- based cooperation among multi-agents increases the system’s total reward. We prove that Shapley value-based reward reallocation of MARL locates in the core if the transferable utility game is a convex game. Hence, the cooperation is stable and efficient and the agents should stay in the coalition or the cooperating group. We then propose a cooperative policy learning algorithm with Shapley value reward reallocation. In experiments, compared with several literature algorithms, we show the improvement of the mean episode system reward of CAV systems using our proposed algorithm.  more » « less
Award ID(s):
2047354 1849246
NSF-PAR ID:
10329063
Author(s) / Creator(s):
Date Published:
Journal Name:
2022 IEEE International Conference on Robotics and Automation
Page Range / eLocation ID:
8765 - 8771
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Connected Autonomous Vehicles (CAVs) are expected to enable reliable, efficient, and intelligent transportation systems. Most motion planning algorithms for multi-agent systems implicitly assume that all vehicles/agents will execute the expected plan with a small error and evaluate their safety constraints based on this fact. This assumption, however, is hard to keep for CAVs since they may have to change their plan (e.g., to yield to another vehicle) or are forced to stop (e.g., A CAV may break down). While it is desired that a CAV never gets involved in an accident, it may be hit by other vehicles and sometimes, preventing the accident is impossible (e.g., getting hit from behind while waiting behind the red light). Responsibility-Sensitive Safety (RSS) is a set of safety rules that defines the objective of CAV to blame, instead of safety. Thus, instead of developing a CAV algorithm that will avoid any accident, it ensures that the ego vehicle will not be blamed for any accident it is a part of. Original RSS rules, however, are hard to evaluate for merge, intersection, and unstructured road scenarios, plus RSS rules do not prevent deadlock situations among vehicles. In this paper, we propose a new formulation for RSS rules that can be applied to any driving scenario. We integrate the proposed RSS rules with the CAV’s motion planning algorithm to enable cooperative driving of CAVs. We use Control Barrier Functions to enforce safety constraints and compute the energy optimal trajectory for the ego CAV. Finally, to ensure liveness, our approach detects and resolves deadlocks in a decentralized manner. We have conducted different experiments to verify that the ego CAV does not cause an accident no matter when other CAVs slow down or stop. We also showcase our deadlock detection and resolution mechanism using our simulator. Finally, we compare the average velocity and fuel consumption of vehicles when they drive autonomously with the case that they are autonomous and connected. 
    more » « less
  2. Connected Autonomous Vehicles (CAVs) are expected to enable reliable and efficient transportation systems. Most motion planning algorithms for multi-agent systems are not completely safe because they implicitly assume that all vehicles/agents will execute the expected plan with a small error. This assumption, however, is hard to keep for CAVs since they may have to slow down (e.g., to yield to a jaywalker) or are forced to stop (e.g. break down), sometimes even without a notice. Responsibility-Sensitive Safety (RSS) defines a set of safety rules for each driving scenario to ensure that a vehicle will not cause an accident irrespective of other vehicles' behavior. RSS rules, however, are hard to evaluate for merge, intersection, and unstructured road scenarios. In addition, deadlock situations can happen that are not considered by the RSS. In this paper, we propose a generic version of RSS rules for CAVs that can be applied to any driving scenario. We integrate the proposed RSS rules with the CAV's motion planning algorithm to enable cooperative driving of CAVs. Our approach can also detect and resolve deadlocks in a decentralized manner. We have conducted experiments to verify that a CAV does not cause an accident no matter when other CAVs slow down or stop. We also showcase our deadlock detection and resolution mechanism. Finally, we compare the average velocity and fuel consumption of vehicles when they drive autonomously but not connected with the case that they are connected. 
    more » « less
  3. Various methods for Multi-Agent Reinforcement Learning (MARL) have been developed with the assumption that agents’ policies are based on accurate state information. However, policies learned through Deep Reinforcement Learning (DRL) are susceptible to adversarial state perturbation attacks. In this work, we propose a State-Adversarial Markov Game (SAMG) and make the first attempt to investigate different solution concepts of MARL under state uncertainties. Our analysis shows that the commonly used solution concepts of optimal agent policy and robust Nash equilibrium do not always exist in SAMGs. To circumvent this difficulty, we consider a new solution concept called robust agent policy, where agents aim to maximize the worst-case expected state value. We prove the existence of robust agent policy for finite state and finite action SAMGs. Additionally, we propose a Robust Multi-Agent Adversarial Actor-Critic (RMA3C) algorithm to learn robust policies for MARL agents under state uncertainties. Our experiments demonstrate that our algorithm outperforms existing methods when faced with state perturbations and greatly improves the robustness of MARL policies. Our code is public on https://songyanghan.github.io/what_is_solution/. 
    more » « less
  4. With the advancement of modern robotics, autonomous agents are now capable of hosting sophisticated algorithms, which enables them to make intelligent decisions. But developing and testing such algorithms directly in real-world systems is tedious and may result in the wastage of valuable resources. Especially for heterogeneous multi-agent systems in battlefield environments where communication is critical in determining the system’s behavior and usability. Due to the necessity of simulators of separate paradigms (co-simulation) to simulate such scenarios before deploying, synchronization between those simulators is vital. Existing works aimed at resolving this issue fall short of addressing diversity among deployed agents. In this work, we propose SynchroSim, an integrated co-simulation middleware to simulate a heterogeneous multi-robot system. Here we propose a velocity difference-driven adjustable window size approach with a view to reducing packet loss probability. It takes into account the respective velocities of deployed agents to calculate a suitable window size before transmitting data between them. We consider our algorithm specific simulator agnostic but for the sake of implementation results, we have used Gazebo as a Physics simulator and NS-3 as a network simulator. Also, we design our algorithm considering the Perception-Action loop inside a closed communication channel, which is one of the essential factors in a contested scenario with the requirement of high fidelity in terms of data transmission. We validate our approach empirically at both the simulation and system level for both line-of-sight (LOS) and non-line-of-sight (NLOS) scenarios. Our approach achieves a noticeable improvement in terms of reducing packet loss probability (≈11%), and average packet delay (≈10%) compared to the fixed window size-based synchronization approach. 
    more » « less
  5. null (Ed.)
    In this article, we propose a novel semicentralized deep deterministic policy gradient (SCDDPG) algorithm for cooperative multiagent games. Specifically, we design a two-level actor-critic structure to help the agents with interactions and cooperation in the StarCraft combat. The local actor-critic structure is established for each kind of agents with partially observable information received from the environment. Then, the global actor-critic structure is built to provide the local design an overall view of the combat based on the limited centralized information, such as the health value. These two structures work together to generate the optimal control action for each agent and to achieve better cooperation in the games. Comparing with the fully centralized methods, this design can reduce the communication burden by only sending limited information to the global level during the learning process. Furthermore, the reward functions are also designed for both local and global structures based on the agents' attributes to further improve the learning performance in the stochastic environment. The developed method has been demonstrated on several scenarios in a real-time strategy game, i.e., StarCraft. The simulation results show that the agents can effectively cooperate with their teammates and defeat the enemies in various StarCraft scenarios. 
    more » « less