In open multiagent systems, the set of agents operating in the environment changes over time and in ways that are nontrivial to predict. For example, if collaborative robots were tasked with fighting wildfires, they may run out of suppressants and be temporarily unavailable to assist their peers. Because an agent’s optimal action depends on the actions of others, each agent must not only predict the actions of its peers, but, before that, reason whether they are even present to perform an action. Addressing openness thus requires agents to model each other’s presence, which can be enhanced through agents communicating about their presence in the environment. At the same time, communicative acts can also incur costs (e.g., consuming limited bandwidth), and thus an agent must tradeoff the benefits of enhanced coordination with the costs of communication. We present a new principled, decision-theoretic method in the context provided by the recent communicative interactive POMDP framework for planning in open agent settings that balances this tradeoff. Simulations of multiagent wildfire suppression problems demonstrate how communication can improve planning in open agent environments, as well as how agents tradeoff the benefits and costs of communication under different scenarios.
more »
« less
Decision-Theoretic Planning with Communication in Open Multiagent Systems
In open multiagent systems, the set of agents operating in the environment changes over time and in ways that are nontrivial to predict. For example, if collaborative robots were tasked with fighting wildfires, they may run out of suppressants and be temporarily unavailable to assist their peers. Because an agent's optimal action depends on the actions of others, each agent must not only predict the actions of its peers, but, before that, reason whether they are even present to perform an action. Addressing openness thus requires agents to model each other’s presence, which can be enhanced through agents communicating about their presence in the environment. At the same time, communicative acts can also incur costs (e.g., consuming limited bandwidth), and thus an agent must tradeoff the benefits of enhanced coordination with the costs of communication. We present a new principled, decision-theoretic method in the context provided by the recent communicative interactive POMDP framework for planning in open agent settings that balances this tradeoff. Simulations of multiagent wildfire suppression problems demonstrate how communication can improve planning in open agent environments, as well as how agents tradeoff the benefits and costs of communication under different scenarios.
more »
« less
- Award ID(s):
- 1909513
- NSF-PAR ID:
- 10345048
- Date Published:
- Journal Name:
- Uncertainty in artificial intelligence
- ISSN:
- 1525-3384
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In open agent systems, the set of agents that are cooperating or competing changes over time and in ways that are nontrivial to predict. For example, if collaborative robots were tasked with fighting wildfires, they may run out of suppressants and be temporarily unavailable to assist their peers. We consider the problem of planning in these contexts with the additional challenges that the agents are unable to communicate with each other and that there are many of them. Because an agent's optimal action depends on the actions of others, each agent must not only predict the actions of its peers, but, before that, reason whether they are even present to perform an action. Addressing openness thus requires agents to model each other's presence, which becomes computationally intractable with high numbers of agents. We present a novel, principled, and scalable method in this context that enables an agent to reason about others' presence in its shared environment and their actions. Our method extrapolates models of a few peers to the overall behavior of the many-agent system, and combines it with a generalization of Monte Carlo tree search to perform individual agent reasoning in many-agent open environments. Theoretical analyses establish the number of agents to model in order to achieve acceptable worst case bounds on extrapolation error, as well as regret bounds on the agent's utility from modeling only some neighbors. Simulations of multiagent wildfire suppression problems demonstrate our approach's efficacy compared with alternative baselines.more » « less
-
null (Ed.)In this article, we propose a novel semicentralized deep deterministic policy gradient (SCDDPG) algorithm for cooperative multiagent games. Specifically, we design a two-level actor-critic structure to help the agents with interactions and cooperation in the StarCraft combat. The local actor-critic structure is established for each kind of agents with partially observable information received from the environment. Then, the global actor-critic structure is built to provide the local design an overall view of the combat based on the limited centralized information, such as the health value. These two structures work together to generate the optimal control action for each agent and to achieve better cooperation in the games. Comparing with the fully centralized methods, this design can reduce the communication burden by only sending limited information to the global level during the learning process. Furthermore, the reward functions are also designed for both local and global structures based on the agents' attributes to further improve the learning performance in the stochastic environment. The developed method has been demonstrated on several scenarios in a real-time strategy game, i.e., StarCraft. The simulation results show that the agents can effectively cooperate with their teammates and defeat the enemies in various StarCraft scenarios.more » « less
-
null (Ed.)Captioning is a crucial and challenging task for video understanding. In videos that involve active agents such as humans, the agent{'}s actions can bring about myriad changes in the scene. Observable changes such as movements, manipulations, and transformations of the objects in the scene, are reflected in conventional video captioning. Unlike images, actions in videos are also inherently linked to social aspects such as intentions (why the action is taking place), effects (what changes due to the action), and attributes that describe the agent. Thus for video understanding, such as when captioning videos or when answering questions about videos, one must have an understanding of these commonsense aspects. We present the first work on generating \textit{commonsense} captions directly from videos, to describe latent aspects such as intentions, effects, and attributes. We present a new dataset {``}Video-to-Commonsense (V2C){''} that contains {\textasciitilde}9k videos of human agents performing various actions, annotated with 3 types of commonsense descriptions. Additionally we explore the use of open-ended video-based commonsense question answering (V2C-QA) as a way to enrich our captions. Both the generation task and the QA task can be used to enrich video captions.more » « less
-
The conventional machine learning (ML) and deep learning (DL) methods use large amount of data to construct desirable prediction models in a central fusion center for recognizing human activities. However, such model training encounters high communication costs and leads to privacy infringement. To address the issues of high communication overhead and privacy leakage, we employed a widely popular distributed ML technique called Federated Learning (FL) that generates a global model for predicting human activities by combining participated agents’ local knowledge. The state-of-the-art FL model fails to maintain acceptable accuracy when there is a large number of unreliable agents who can infuse false model, or, resource-constrained agents that fails to perform an assigned computational task within a given time window. We developed an FL model for predicting human activities by monitoring agent’s contributions towards model convergence and avoiding the unreliable and resource-constrained agents from training. We assign a score to each client when it joins in a network and the score is updated based on the agent’s activities during training. We consider three mobile robots as FL clients that are heterogeneous in terms of their resources such as processing capability, memory, bandwidth, battery-life and data volume. We consider heterogeneous mobile robots for understanding the effects of real-world FL setting in presence of resource-constrained agents. We consider an agent unreliable if it repeatedly gives slow response or infuses incorrect models during training. By disregarding the unreliable and weak agents, we carry-out the local training of the FL process on selected agents. If somehow, a weak agent is selected and started showing straggler issues, we leverage asynchronous FL mechanism that aggregate the local models whenever it receives a model update from the agents. Asynchronous FL eliminates the issue of waiting for a long time to receive model updates from the weak agents. To the end, we simulate how we can track the behavior of the agents through a reward-punishment scheme and present the influence of unreliable and resource-constrained agents in the FL process. We found that FL performs slightly worse than centralized models, if there is no unreliable and resource-constrained agent. However, as the number of malicious and straggler clients increases, our proposed model performs more effectively by identifying and avoiding those agents while recognizing human activities as compared to the stateof-the-art FL and ML approaches.more » « less