skip to main content

Title: On Emergent Communication in Competitive Multi-Agent Teams
Several recent works have found the emergence of grounded com-positional language in the communication protocols developed bymostly cooperative multi-agent systems when learned end-to-endto maximize performance on a downstream task. However, humanpopulations learn to solve complex tasks involving communicativebehaviors not only in fully cooperative settings but also in scenar-ios where competition acts as an additional external pressure forimprovement. In this work, we investigate whether competitionfor performance from an external, similar agent team could actas a social influence that encourages multi-agent populations todevelop better communication protocols for improved performance,compositionality, and convergence speed. We start fromTask &Talk, a previously proposed referential game between two coopera-tive agents as our testbed and extend it intoTask, Talk & Compete,a game involving two competitive teams each consisting of twoaforementioned cooperative agents. Using this new setting, we pro-vide an empirical study demonstrating the impact of competitiveinfluence on multi-agent teams. Our results show that an externalcompetitive influence leads to improved accuracy and generaliza-tion, as well as faster emergence of communicative languages thatare more informative and compositional.
; ; ; ;
Award ID(s):
1750439 1722822
Publication Date:
Journal Name:
Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems
Page Range or eLocation-ID:
Sponsoring Org:
National Science Foundation
More Like this
  1. The mammalian suprachiasmatic nucleus (SCN) comprises about 20,000 interconnected oscillatory neurons that create and maintain a robust circadian signal which matches to external light cues. Here, we use an evolutionary game theoretic framework to explore how evolutionary constraints can influence the synchronization of the system under various assumptions on the connection topology, contributing to the understanding of the structure of interneuron connectivity. Our basic model represents the SCN as a network of agents each with two properties—a phase and a flag that determines if it communicates with its neighbors or not. Communication comes at a cost to the agent, but synchronization of phases with its neighbors bears a benefit. Earlier work shows that when we have “all-to-all” connectivity, where every agent potentially communicates with every other agent, there is often a simple trade-off that leads to complete communication and synchronization of the system: the benefit must be greater than twice the cost. This trade-off for all-to-all connectivity gives us a baseline to compare to when looking at other topologies. Using simulations, we compare three plausible topologies to the all-to-all case, finding that convergence to synchronous dynamics occurs in all considered topologies under similar benefit and cost trade-offs. Consequently, sparser, lessmore »biologically costly topologies are reasonable evolutionary outcomes for organisms that develop a synchronizable oscillatory network. Our simulations also shed light on constraints imposed by the time scale on which we observe the SCN to arise in mammals. We find two conditions that allow for a synchronizable system to arise in relatively few generations. First, the benefits of connectivity must outweigh the cost of facilitating the connectivity in the network. Second, the game at the core of the model needs to be more cooperative than antagonistic games such as the Prisoner’s Dilemma. These results again imply that evolutionary pressure may have driven the system towards sparser topologies, as they are less costly to create and maintain. Last, our simulations indicate that models based on the mutualism game fare the best in uptake of communication and synchronization compared to more antagonistic games such as the Prisoner’s Dilemma.« less
  2. Teamwork is a set of interrelated reasoning, actions and behaviors of team members that facilitate common objectives. Teamwork theory and experiments have resulted in a set of states and processes for team effectiveness in both human-human and agent-agent teams. However, human-agent teaming is less well studied because it is so new and involves asymmetry in policy and intent not present in human teams. To optimize team performance in human-agent teaming, it is critical that agents infer human intent and adapt their polices for smooth coordination. Most literature in human-agent teaming builds agents referencing a learned human model. Though these agents are guaranteed to perform well with the learned model, they lay heavy assumptions on human policy such as optimality and consistency, which is unlikely in many real-world scenarios. In this paper, we propose a novel adaptive agent architecture in human-model-free setting on a two-player cooperative game, namely Team Space Fortress (TSF). Previous human-human team research have shown complementary policies in TSF game and diversity in human players’ skill, which encourages us to relax the assumptions on human policy. Therefore, we discard learning human models from human data, and instead use an adaptation strategy on a pre-trained library of exemplar policiesmore »composed of RL algorithms or rule-based methods with minimal assumptions of human behavior. The adaptation strategy relies on a novel similarity metric to infer human policy and then selects the most complementary policy in our library to maximize the team performance. The adaptive agent architecture can be deployed in real-time and generalize to any off-the-shelf static agents. We conducted human-agent experiments to evaluate the proposed adaptive agent framework, and demonstrated the suboptimality, diversity, and adaptability of human policies in human-agent teams.« less
  3. With the development of sensing and communica- tion technologies in networked cyber-physical systems (CPSs), multi-agent reinforcement learning (MARL)-based methodolo- gies are integrated into the control process of physical systems and demonstrate prominent performance in a wide array of CPS domains, such as connected autonomous vehicles (CAVs). However, it remains challenging to mathematically characterize the improvement of the performance of CAVs with commu- nication and cooperation capability. When each individual autonomous vehicle is originally self-interest, we can not assume that all agents would cooperate naturally during the training process. In this work, we propose to reallocate the system’s total reward efficiently to motivate stable cooperation among autonomous vehicles. We formally define and quantify how to reallocate the system’s total reward to each agent under the proposed transferable utility game, such that communication- based cooperation among multi-agents increases the system’s total reward. We prove that Shapley value-based reward reallocation of MARL locates in the core if the transferable utility game is a convex game. Hence, the cooperation is stable and efficient and the agents should stay in the coalition or the cooperating group. We then propose a cooperative policy learning algorithm with Shapley value reward reallocation. In experiments, compared with several literaturemore »algorithms, we show the improvement of the mean episode system reward of CAV systems using our proposed algorithm.« less
  4. Service function chaining (SFC), consisting of a sequence of virtual network functions (VNFs) (i.e., firewalls and load balancers), is an effective service provision technique in modern data center networks. By requiring cloud user traffic to traverse the VNFs in order, SFC im- proves the security and performance of the cloud user applications. In this paper, we study how to place an SFC inside a data center to mini- mize the network traffic of the virtual machine (VM) communication. We take a cooperative multi-agent reinforcement learning approach, wherein multiple agents collaboratively figure out the traffic-efficient route for the VM communication. Underlying the SFC placement is a fundamental graph-theoretical prob- lem called the k-stroll problem. Given a weighted graph G(V, E), two nodes s, t ∈ V , and an integer k, the k-stroll problem is to find the shortest path from s to t that visits at least k other nodes in the graph. Our work is the first to take a multi-agent learning approach to solve k- stroll problem. We compare our learning algorithm with an optimal and exhaustive algorithm and an existing dynamic programming(DP)-based heuristic algorithm. We show that our learning algorithm, although lack- ing the complete knowledge ofmore »the network assumed by existing research, delivers comparable or even better VM communication time while taking two orders of magnitude of less execution time.« less
  5. Successfully navigating the social world requires reasoning about both high-level strategic goals, such as whether to cooperate or compete, as well as the low-level actions needed to achieve those goals. While previous work in experimental game theory has examined the former and work on multi-agent systems has examined the later, there has been little work investigating behavior in environments that require simultaneous planning and inference across both levels. We develop a hierarchical model of social agency that infers the intentions of other agents, strategically decides whether to cooperate or compete with them, and then executes either a cooperative or competitive planning program. Learning occurs across both high-level strategic decisions and low-level actions leading to the emergence of social norms. We test predictions of this model in multi-agent behavioral experiments using rich video-game like environments. By grounding strategic behavior in a formal model of planning, we develop abstract notions of both cooperation and competition and shed light on the computational nature of joint intentionality.