skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Superhuman AI for multiplayer poker
In recent years there have been great strides in artificial intelligence (AI), with games often serving as challenge problems, benchmarks, and milestones for progress. Poker has served for decades as such a challenge problem. Past successes in such benchmarks, including poker, have been limited to two-player games. However, poker in particular is traditionally played with more than two players. Multiplayer games present fundamental additional issues beyond those in two-player games, and multiplayer poker is a recognized AI milestone. In this paper we present Pluribus, an AI that we show is stronger than top human professionals in six-player no-limit Texas hold’em poker, the most popular form of poker played by humans.  more » « less
Award ID(s):
1901403 1733556 1718457 1617590
PAR ID:
10119653
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Science
Volume:
365
Issue:
6456
ISSN:
0036-8075
Page Range / eLocation ID:
885 to 890
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    This paper focuses on player modeling in multiplayer adaptive games. While player modeling has received a significant amount of attention, less is known about how to use player modeling in multiplayer games, especially when an experience management AI must make decisions on how to adapt the experience for the group as a whole. Specifically, we present a multi-armed bandit (MAB) approach for modeling groups of multiple players. Our main contributions are a new MAB frame- work for multiplayer modeling and techniques for addressing the new challenges introduced by the multiplayer context, extending previous work on MAB-based player modeling to account for new group-generated phenomena not present in single-user models. We evaluate our approach via simulation of virtual players in the context of multiplayer adaptive exergames. 
    more » « less
  2. Experience management (EM) agents in multiplayer serious games face unique challenges and responsibilities regarding the fair treatment of players. One such challenge is the Greedy Bandit Problem that arises when using traditional Multi-Armed Bandits (MABs) as EM agents, which results in some players routinely prioritized while others may be ignored. We will show that this problem can be a cause of player non-adherence in a multiplayer serious game played by human users. To mitigate this effect, we propose a new bandit strategy, the Shapley Bandit, which enforces fairness constraints in its treatment of players based on the Shapley Value. We evaluate our approach via simulation with virtual players, finding that the Shapley Bandit can be effective in providing more uniform treatment of players while incurring only a slight cost in overall performance to a typical greedy approach. Our findings highlight the importance of fair treatment among players as a goal of multiplayer EM agents and discuss how addressing this issue may lead to more effective agent operation overall. The study contributes to the understanding of player modeling and EM in serious games and provides a promising approach for balancing fairness and engagement in multiplayer environments. 
    more » « less
  3. Recent algorithms have achieved superhuman performance at a number of twoplayer zero-sum games such as poker and go. However, many real-world situations are multi-player games. Zero-sum two-team games, such as bridge and football, involve two teams where each member of the team shares the same reward with every other member of that team, and each team has the negative of the reward of the other team. A popular solution concept in this setting, called TMECor, assumes that teams can jointly correlate their strategies before play, but are not able to communicate during play. This setting is harder than two-player zerosum games because each player on a team has different information and must use their public actions to signal to other members of the team. Prior works either have game-theoretic guarantees but only work in very small games, or are able to scale to large games but do not have game-theoretic guarantees. In this paper we introduce two algorithms: Team-PSRO, an extension of PSRO from twoplayer games to team games, and Team-PSRO Mix-and-Match which improves upon Team PSRO by better using population policies. In Team-PSRO, in every iteration both teams learn a joint best response to the opponent’s meta-strategy via reinforcement learning. As the reinforcement learning joint best response approaches the optimal best response, Team-PSRO is guaranteed to converge to a TMECor. In experiments on Kuhn poker and Liar’s Dice, we show that a tabular version of Team-PSRO converges to TMECor, and a version of Team PSRO using deep cooperative reinforcement learning beats self-play reinforcement learning in the large game of Google Research Football. 
    more » « less
  4. Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algo- rithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to an approximate Nash equilibrium and can handle continuous actions, it may take an exponential number of iterations as the number of information states (infostates) grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm for two-player zero-sum games that is guar- anteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an approximate Nash equilibrium in a number of iterations an order of magnitude smaller than PSRO. Experiments on a modified Leduc poker game and Oshi-Zumo show that tabular XDO achieves a lower exploitability than CFR with the same amount of computation. We also find that NXDO outperforms PSRO and NFSP on a sequential multidimensional continuous-action game. NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games. Experiment code is available at https://github.com/indylab/nxdo. 
    more » « less
  5. In a recent work, Moshkovitz [FOCS '14] presented a transformation on two-player games called ``fortification'', and gave an elementary proof of an (exponential decay) parallel repetition theorem for fortified two-player projection games. In this paper, we give an \emph{analytic reformulation} of Moshkovitz's fortification framework, which was originally cast in combinatorial terms. This reformulation allows us to expand the scope of the fortification method to new settings. First, we show \emph{any} game (not just projection games) can be fortified, and give a simple proof of parallel repetition for general fortified games. Then, we prove parallel repetition and fortification theorems for games with players sharing quantum entanglement, as well as games with more than two players. This gives a new gap amplification method for general games in the quantum and multiplayer settings, two problems which have recently received much attention. An important component of our work is a variant of the fortification transformation, called ``ordered fortification", that preserves the entangled value of a game. The original fortification of Moshkovitz does not in general preserve the entangled value of a game, and this was a barrier to extending the fortification framework to the quantum setting. 
    more » « less