skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 5, 2026

Title: Safe Multi-Agent Learning via Shielding in Decentralized Environments
Multi-Agent Reinforcement Learning can be used to learn solutions for a wide variety of tasks, but there are few safety guarantees about the policies that the agents learn. My research addresses the challenge of ensuring safety in communication-free multi-agent environments, using shielding as the primary tool. We introduce methods to completely prevent safety violations in domains for which a model is available, in both fully observable and partially observable environments. We present ongoing research on maximizing safety in environments for which no model is available, utilizing a centralized training, decentralized execution framework, and discuss future lines of research.  more » « less
Award ID(s):
2319500
PAR ID:
10614502
Author(s) / Creator(s):
Publisher / Repository:
Doctoral Consortium of AAMAS '25: the 24th International Conference on Autonomous Agents and Multiagent Systems
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Shielding is an effective method for ensuring safety in multi-agent domains; however, its applicability has previously been limited to environments for which an approximate discrete model and safety specification are known in advance. We present a method for learning shields in cooperative fully-observable multi-agent environments where neither a model nor safety specification are provided, using architectural constraints to realize several important properties of a shield. We show through a series of experiments that our learned shielding method is effective at significantly reducing safety violations, while largely maintaining the ability of an underlying reinforcement learning agent to optimize for reward. 
    more » « less
  2. Shielding is an effective method for ensuring safety in multi-agent domains; however, its applicability has previously been limited to environments for which an approximate discrete model and safety specification are known in advance. We present a method for learning shields in cooperative fully-observable multi-agent environments where neither a model nor safety specification are provided, using architectural constraints to realize several important properties of a shield. We show through a series of experiments that our learned shielding method is effective at significantly reducing safety violations, while largely maintaining the ability of an underlying reinforcement learning agent to optimize for reward. 
    more » « less
  3. Abstract For simulation to be an effective tool for the development and testing of autonomous vehicles, the simulator must be able to produce realistic safety-critical scenarios with distribution-level accuracy. However, due to the high dimensionality of real-world driving environments and the rarity of long-tail safety-critical events, how to achieve statistical realism in simulation is a long-standing problem. In this paper, we develop NeuralNDE, a deep learning-based framework to learn multi-agent interaction behavior from vehicle trajectory data, and propose a conflict critic model and a safety mapping network to refine the generation process of safety-critical events, following real-world occurring frequencies and patterns. The results show that NeuralNDE can achieve both accurate safety-critical driving statistics (e.g., crash rate/type/severity and near-miss statistics, etc.) and normal driving statistics (e.g., vehicle speed/distance/yielding behavior distributions, etc.), as demonstrated in the simulation of urban driving environments. To the best of our knowledge, this is the first time that a simulation model can reproduce the real-world driving environment with statistical realism, particularly for safety-critical situations. 
    more » « less
  4. Learning multi-agent system dynamics has been extensively studied for various real-world applications, such as molecular dynamics in biology, multi-body system in physics, and particle dynamics in material science. Most of the existing models are built to learn single system dynamics, which learn the dynamics from observed historical data and predict the future trajectory. In practice, however, we might observe multiple systems that are generated across different environments, which differ in latent exogenous factors such as temperature and gravity. One simple solution is to learn multiple environment-specific models, but it fails to exploit the potential commonalities among the dynamics across environments and offers poor prediction results where per-environment data is sparse or limited. Here, we present GG-ODE (Generalized Graph Ordinary Differential Equations), a machine learning framework for learning continuous multi-agent system dynamics across environments. Our model learns system dynamics using neural ordinary differential equations (ODE) parameterized by Graph Neural Networks (GNNs) to capture the continuous interaction among agents. We achieve the model generalization by assuming the dynamics across different environments are governed by common physics laws that can be captured via learning a shared ODE function. The distinct latent exogenous factors learned for each environment are incorporated into the ODE function to account for their differences. To improve model performance, we additionally design two regularization losses to (1) enforce the orthogonality between the learned initial states and exogenous factors via mutual information minimization; and (2) reduce the temporal variance of learned exogenous factors within the same system via contrastive learning. Experiments over various physical simulations show that our model can accurately predict system dynamics, especially in the long range, and can generalize well to new systems with few observations. 
    more » « less
  5. In the field of multi-agent autonomous transportation, such as automated payload delivery or highway on-ramp merging, agents routinely exchange knowledge to optimize their shared objective and adapt to environmental novelties through Cooperative Multi-Agent Reinforcement Learning (CMARL) algorithms. This knowledge exchange between agents allows these systems to operate efficiently and adapt to dynamic environments. However, this cooperative learning process is susceptible to adversarial poisoning attacks, as highlighted by contemporary research. Particularly, the poisoning attacks where malicious agents inject deceptive information camouflaged within the differential noise, a pivotal element for differential privacy (DP)-based CMARL algorithms, pose formidable challenges to identify and overcome. The consequences of not addressing this issue are far-reaching, potentially jeopardizing safety-critical operations and the integrity of data privacy in these applications. Existing research has strived to develop anomaly detection based defense models to counteract conventional poisoning methods. Nonetheless, the recurring necessity for model offloading and retraining with labeled anomalous data undermines their practicality, considering the inherently dynamic nature of the safety-critical autonomous transportation applications. Further, it is imperative to maintain data privacy, ensure high performance, and adapt to environmental changes. Motivated by these challenges, this article introduces a novel defense mechanism against stealthy adversarial poisoning attacks in the autonomous transportation domain, termedReinforcing Autonomous Multi-agent Protection through Adversarial Resistance in Transportation(RAMPART). Leveraging a GAN model at each local node, RAMPART effectively filters out malicious advice in an unsupervised manner while generating synthetic samples for each state-action pair to accommodate environmental uncertainties and eliminate the need for labeled training data. Our extensive experimental analysis, conducted in a private payload delivery network—a common application in the autonomous multi-agent transportation domain—demonstrates that RAMPART successfully defends against a DP-exploited poisoning attack with a 30% attack ratio, achieving an F1 score of 0.852 and accuracy of 96.3% in heavy traffic environments. 
    more » « less