skip to main content


Title: Too Many Cooks: Bayesian Inference for Coordinating Multi‐Agent Collaboration
Abstract

Collaboration requires agents to coordinate their behavior on the fly, sometimes cooperating to solve a single task together and other times dividing it up into sub‐tasks to work on in parallel. Underlying the human ability to collaborate is theory‐of‐mind (ToM), the ability to infer the hidden mental states that drive others to act. Here, we develop Bayesian Delegation, a decentralized multi‐agent learning mechanism with these abilities. Bayesian Delegation enables agents to rapidly infer the hidden intentions of others by inverse planning. We test Bayesian Delegation in a suite of multi‐agent Markov decision processes inspired by cooking problems. On these tasks, agents with Bayesian Delegation coordinate both their high‐level plans (e.g., what sub‐task they should work on) and their low‐level actions (e.g., avoiding getting in each other's way). When matched with partners that act using the same algorithm, Bayesian Delegation outperforms alternatives. Bayesian Delegation is also a capable ad hoc collaborator and successfully coordinates with other agent types even in the absence of prior experience. Finally, in a behavioral experiment, we show that Bayesian Delegation makes inferences similar to human observers about the intent of others. Together, these results argue for the centrality of ToM for successful decentralized multi‐agent collaboration.

 
more » « less
NSF-PAR ID:
10220850
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Topics in Cognitive Science
Volume:
13
Issue:
2
ISSN:
1756-8757
Page Range / eLocation ID:
p. 414-432
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We describe and analyze our efforts to support Learning Assistants (LAs)—undergraduate peer educators who simultaneously take a 3-credit pedagogy course—in fostering equitable team dynamics and collaboration within a project-based engineering design course. Tonso and others have shown that (a) inequities can “live” in mundane interactions such as those among students within design teams and (b) those inequities both reflect and (re)produce broader cultural patterns and narratives (e.g. Wolfe & Powell, 2009; Tonso, 1996, 2006a, 2006b; McLoughlin, 2005). LAs could be well-positioned to notice and potentially disrupt inequitable patterns of participation within design teams. In this paper, we explore (1) How do LAs notice, diagnose, and consider responding to teamwork troubles within design teams, and (2) What ideological assumptions plausibly contribute to LAs’ sensemaking around their students’ teamwork troubles? To do so, we analyze how the LAs notice and consider responding to issues of equitable teamwork and participation, as exhibited in three related activities: (i) an in-class roleplay, (ii) observing and diagnosing teamwork troubles (TTs) in the engineering design teams, and (iii) imagining possible instructional responses to those troubles, and students’ possible reactions. We articulate three modes of thinking that roughly capture patterns in LAs’ descriptions and diagnoses of, and imagined responses to, the teamwork troubles: individual accountability, where the trouble is seen as caused by individual(s) described as “off task” or “checked out” or demonstrating some level of incompetence; delegation of work, where the trouble was located in the team leader’s inability to delegate tasks effectively to team members, or in the group’s general lack of communication about what tasks need to be completed, who should execute the tasks, and what work other groups in the team were doing; and emergent systems, where trouble was described as a group-level phenomenon emerging from the patterns of interaction amongst group members, contextual features, and larger structural forces. We find that LAs drew on individual accountability and delegation of work to evaluate TTs. Much rarer were ascriptions of TTs to interactional dynamics between teammates. We connected these modes to the underlying ideological assumptions that have consequences for how meritocracy and technocracy (Slaton, 2015; Cech, 2014) play out in an engineering design classroom and serve to ameliorate or reify engineering mindsets (Riley, 2008). The modes are asymmetric, in that emergent systems based interpretations hold more potential for elucidating ongoing social processes, for challenging meritocracy and socio-technical duality, and for seeing power differentials within interpersonal and institutional contexts. We argue for the need to better understand the ideological assumptions underlying how peer-educators—and other instructors—interpret classroom events. 
    more » « less
  2. In multi-agent reinforcement learning (MARL), it is challenging for a collection of agents to learn complex temporally extended tasks. The difficulties lie in computational complexity and how to learn the high-level ideas behind reward functions. We study the graph-based Markov Decision Process (MDP), where the dynamics of neighboring agents are coupled. To learn complex temporally extended tasks, we use a reward machine (RM) to encode each agent’s task and expose reward function internal structures. RM has the capacity to describe high-level knowledge and encode non-Markovian reward functions. We propose a decentralized learning algorithm to tackle computational complexity, called decentralized graph-based reinforcement learning using reward machines (DGRM), that equips each agent with a localized policy, allowing agents to make decisions independently based on the information available to the agents. DGRM uses the actor-critic structure, and we introduce the tabular Q-function for discrete state problems. We show that the dependency of the Q-function on other agents decreases exponentially as the distance between them increases. To further improve efficiency, we also propose the deep DGRM algorithm, using deep neural networks to approximate the Q-function and policy function to solve large-scale or continuous state problems. The effectiveness of the proposed DGRM algorithm is evaluated by three case studies, two wireless communication case studies with independent and dependent reward functions, respectively, and COVID-19 pandemic mitigation. Experimental results show that local information is sufficient for DGRM and agents can accomplish complex tasks with the help of RM. DGRM improves the global accumulated reward by 119% compared to the baseline in the case of COVID-19 pandemic mitigation. 
    more » « less
  3. null (Ed.)
    Communication between human and mobile agents is getting increasingly important as such agents are widely deployed in our daily lives. Vision-and-Dialogue Navigation is one of the tasks that evaluate the agent’s ability to interact with humans for assistance and navigate based on natural language responses. In this paper, we explore the Navigation from Dialogue History (NDH) task, which is based on the Cooperative Vision-and-Dialogue Navigation (CVDN) dataset, and present a state-of-the-art model which is built upon Vision-Language transformers. However, despite achieving competitive performance, we find that the agent in the NDH task is not evaluated appropriately by the primary metric – Goal Progress. By analyzing the performance mismatch between Goal Progress and other metrics (e.g., normalized Dynamic Time Warping) from our state-of-the-art model, we show that NDH’s sub-path based task setup (i.e., navigating partial trajectory based on its correspondent subset of the full dialogue) does not provide the agent with enough supervision signal towards the goal region. Therefore, we propose a new task setup called NDH-Full which takes the full dialogue and the whole navigation path as one instance. We present a strong baseline model and show initial results on this new task. We further describe several approaches that we try, in order to improve the model performance (based on curriculum learning, pre-training, and data-augmentation), suggesting potential useful training methods on this new NDH-Full task. 
    more » « less
  4. Systems engineering processes coordinate the efforts of many individuals to design a complex system. However, the goals of the involved individuals do not necessarily align with the system-level goals. Everyone, including managers, systems engineers, subsystem engineers, component designers, and contractors, is self-interested. It is not currently understood how this discrepancy between organizational and personal goals affects the outcome of complex systems engineering processes. To answer this question, we need a systems engineering theory that accounts for human behavior. Such a theory can be ideally expressed as a dynamic hierarchical network game of incomplete information. The nodes of this network represent individual agents and the edges the transfer of information and incentives. All agents decide independently on how much effort they should devote to a delegated task by maximizing their expected utility; the expectation is over their beliefs about the actions of all other individuals and the moves of nature. An essential component of such a model is the quality function, defined as the map between an agent’s effort and the quality of their job outcome. In the economics literature, the quality function is assumed to be a linear function of effort with additive Gaussian noise. This simplistic assumption ignores two critical factors relevant to systems engineering: (1) the complexity of the design task, and (2) the problem-solving skills of the agent. Systems engineers establish their beliefs about these two factors through years of job experience. In this paper, we encode these beliefs in clear mathematical statements about the form of the quality function. Our approach proceeds in two steps: (1) we construct a generative stochastic model of the delegated task, and (2) we develop a reduced order representation suitable for use in a more extensive game-theoretic model of a systems engineering process. Focusing on the early design stages of a systems engineering process, we model the design task as a function maximization problem and, thus, we associate the systems engineer’s beliefs about the complexity of the task with their beliefs about the complexity of the function being maximized. Furthermore, we associate an agent’s problem solving-skills with the strategy they use to solve the underlying function maximization problem. We identify two agent types: “naïve” (follows a random search strategy) and “skillful” (follows a Bayesian global optimization strategy). Through an extensive simulation study, we show that the assumption of the linear quality function is only valid for small effort levels. In general, the quality function is an increasing, concave function with derivative and curvature that depend on the problem complexity and agent’s skills. 
    more » « less
  5. Extensive literature exists studying decentralized coordination and consensus, with considerable attention devoted to ensuring robustness to faults and attacks. However, most of the latter literature assumes that non-malicious agents follow simple stylized rules. In reality, decentralized protocols often involve humans, and understanding how people coordinate in adversarial settings is an open problem. We initiate a study of this problem, starting with a human subjects investigation of human coordination on networks in the presence of adversarial agents, and subsequently using the resulting data to bootstrap the development of a credible agent-based model of adversarial decentralized coordination. In human subjects experiments, we observe that while adversarial nodes can successfully prevent consensus, the ability to communicate can significantly improve robustness, with the impact particularly significant in scale-free networks. On the other hand, and contrary to typical stylized models of behavior, we show that the existence of trusted nodes has limited utility. Next, we use the data collected in human subject experiments to develop a data-driven agent-based model of adversarial coordination. We show that this model successfully reproduces observed behavior in experiments, is robust to small errors in individual agent models, and illustrate its utility by using it to explore the impact of optimizing network location of trusted and adversarial nodes. 
    more » « less