Effective coordination of design teams must account for the influence of costs incurred while searching for the best design solutions. This article introduces a cost-aware multi-agent system (MAS), a theoretical model to (1) explain how individuals in a team should search, assuming that they are all rational utility-maximizing decision-makers and (2) study the impact of cost on the search performance of both individual agents and the system. First, we develop a new multi-agent Bayesian optimization framework accounting for information exchange among agents to support their decisions on where to sample in search. Second, we employ a reinforcement learning approach based on the multi-agent deep deterministic policy gradient for training MAS to identify where agents cannot sample due to design constraints. Third, we propose a new cost-aware stopping criterion for each agent to determine when costs outweigh potential gains in search as a criterion to stop. Our results indicate that cost has a more significant impact on MAS communication in complex design problems than in simple ones. For example, when searching in complex design spaces, some agents could initially have low-performance gains, thus stopping prematurely due to negative payoffs, even if those agents could perform better in the later stage of the search. Therefore, global-local communication becomes more critical in such situations for the entire system to converge. The proposed model can serve as a benchmark for empirical studies to quantitatively gauge how humans would rationally make design decisions in a team.
more »
« less
The Impact of Agent Definitions and Interactions on Multiagent Learning for Coordination
The state-action space of an individual agent in a multiagent team fundamentally dictates how the individual interacts with the rest of the team. Thus, how an agent is defined in the context of its domain has a significant effect on team performance when learning to coordinate. In this work we explore the trade-offs associated with these design choices, for example, having fewer agents in the team that individually are able to process and act on a wider scope of information about the world versus a larger team of agents where each agent observes and acts in a more local region of the domain. We focus our study on a traffic management domain and highlight the trends in learning performance when applying different agent definitions.
more »
« less
- Award ID(s):
- 1815886
- PAR ID:
- 10121099
- Date Published:
- Journal Name:
- AAMAS Conference proceedings
- ISSN:
- 2523-5699
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Multi-robot teams have been shown to be effective in accomplishing complex tasks which require tight coordination among team members. In homogeneous systems, recent work has demonstrated that “stepping stone” rewards are an effective way to provide agents with feedback on potentially valuable actions even when the agent-to-agent coupling require- ments of an objective are not satisfied. In this work, we propose a new mechanism for inferring hypothetical partners in tightly-coupled, heterogeneous systems called Dirichlet-Multinomial Counterfactual Selection (DMCS). Using DMCS, we show that agents can learn to infer appropriate counterfactual partners to receive more informative stepping stone rewards by testing in a modified multi-rover exploration problem. We also show that DMCS outperforms a random partner selection baseline by over 40%, and we demonstrate how domain knowledge can be used to induce a prior to guide the agent learning process. Finally, we show that DMCS maintains superior performance for up to 15 distinct rover types compared to the performance of the baseline which degrades rapidly.more » « less
-
Information sharing among agents to jointly solve problems is challenging for multi-agent reinforcement learning algorithms (MARL) in smart environments. In this paper, we present a novel information sharing approach for MARL, which introduces a Team Information Matrix (TIM) that integrates scenario-independent spatial and environmental information combined with the agent's local observations, augmenting both individual agent's performance and global awareness during the MARL learning. To evaluate this approach, we conducted experiments on three multi-agent scenarios of varying difficulty levels implemented in Unity ML-Agents Toolkit. Experimental results show that the agents utilizing our TIM-Shared variation outperformed those using decentralized MARL and achieved comparable performance to agents employing centralized MARL.more » « less
-
We introduce a sequential Bayesian binary hypothesis testing problem under social learning, termed selfish learning, where agents work to maximize their individual rewards. In particular, each agent receives a private signal and is aware of decisions made by earlier-acting agents. Beside inferring the underlying hypothesis, agents also decide whether to stop and declare, or pass the inference to the next agent. The employer rewards only correct responses and the reward per worker decreases with the number of employees used for decision making. We characterize decision regions of agents in the infinite and finite horizon. In particular, we show that the decision boundaries in the infinite horizon are the solutions to a Markov Decision Process with discounted costs, and can be solved using value iteration. In the finite horizon, we show that team performance is enhanced upon appropriate incentivization when compared to sequential social learning.more » « less
-
In multi-agent Bayesian optimization for Design Space Exploration (DSE), identifying a communication network among agents to share useful design information for enhanced cooperation and performance, considering the trade-off between connectivity and cost, poses significant challenges. To address this challenge, we develop a distributed multi-agent Bayesian optimization (DMABO) framework and study how communication network structures/connectivity and the resulting cost would impact the performance of a team of agents when finding the global optimum. Specifically, we utilize Lloyd’s algorithm to partition the design space to assign distinct regions to individual agents for exploration in the distributed multi-agent system (MAS). Based on this partitioning, we generate communication networks among agents using two models: 1) a range-limited model of communication constrained by neighborhood information; and 2) a range-free model without neighborhood constraints. We introduce network density as a metric to quantify communication costs. Then, we generate communication networks by gradually increasing the network density to assess the impact of communication costs on the performance of MAS in DSE. The experimental results show that the communication network based on the range-limited model can significantly improve performance without incurring high communication costs. This indicates that increasing the density of a communication network does not necessarily improve MAS performance in DSE. Furthermore, the results indicate that communication is only beneficial for team performance if it occurs between specific agents whose search regions are critically relevant to the location of the global optimum. The proposed DMABO framework and the insights obtained can help identify the best trade-off between communication structure and cost for MAS in unknown design space exploration.more » « less
An official website of the United States government

