Driven by recent successes in twoplayer, zerosum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibriumbased strategies. However, this approach has been less effective at producing competent players in generalsum games or those with more than two players than in twoplayer, zerosum games. An appealing alternative is to consider adaptive algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior. This approach also leads to a gametheoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium. We develop and advocate for this hindsight rationality framing of learning in general sequential decisionmaking settings. To this end, we reexamine mediated equilibrium and deviation types in extensiveform games, thereby gaining a more complete understanding and resolving past misconceptions. We present a set of examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature, and prove that no tractable concept subsumes all others. This line of inquiry culminates in the definition of the deviation and equilibrium classes that correspond to algorithms in the counterfactual regret minimization (CFR) family, relating them to all others inmore »
Bayesian Multiagent Inverse Reinforcement Learning for Policy Recommendation
We study the following problem, which to our knowledge has been addressed only partially in the literature and not in full generality. An agent observes two players play a zerosum game that is known to the players but not the agent. The agent observes the actions and state transitions of their game play, but not rewards. The players may play either optimally (according to some Nash equilibrium) or according to any other solution concept, such as a quantal response equilibrium. Following these observations, the agent must recommend a policy for one player, say Player 1. The goal is to recommend a policy that is minimally exploitable under the true, but unknown, game. We take a Bayesian approach. We establish a likelihood function based on observations and the specified solution concept. We then propose an approach based on Markov chain Monte Carlo (MCMC), which allows us to approximately sample games from the agent’s posterior belief distribution. Once we have a batch of independent samples from the posterior, we use linear programming and backward induction to compute a policy for Player 1 that minimizes the sum of exploitabilities over these games. This approximates the policy that minimizes the expected exploitability under the more »
 Award ID(s):
 1901403
 Publication Date:
 NSFPAR ID:
 10289317
 Journal Name:
 AAAI21 Workshop on Reinforcement Learning in Games
 Sponsoring Org:
 National Science Foundation
More Like this


The existence of simple uncoupled noregret learning dynamics that converge to correlated equilibria in normalform games is a celebrated result in the theory of multiagent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normalform game, the empirical frequency of play converges to a normalform correlated equilibrium. Extensiveform games generalize normalform games by modeling both sequential and simultaneous moves, as well as imperfect information. Because of the sequential nature and presence of private information in the game, correlation in extensiveform games possesses significantly different properties than its counterpart in normalform games, many of which are still open research directions. Extensiveform correlated equilibrium (EFCE) has been proposed as the natural extensiveform counterpart to the classical notion of correlated equilibrium in normalform games. Compared to the latter, the constraints that define the set of EFCEs are significantly more complex, as the correlation device must keep into account the evolution of beliefs of each player as they make observations throughout the game. Due to that significant added complexity, the existence of uncoupled learning dynamics leading to an EFCE has remained a challenging open research question for a longmore »

The existence of simple, uncoupled noregret dynamics that converge to correlated equilibria in normalform games is a celebrated result in the theory of multiagent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normalform game, the empirical frequency of play converges to a normalform correlated equilibrium. Extensiveform (that is, treeform) games generalize normalform games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensiveform correlation has significantly different properties than the normalform counterpart, many of which are still open research directions. Extensiveform correlated equilibrium (EFCE) has been proposed as the natural extensiveform counterpart to normalform correlated equilibrium. However, it was currently unknown whether EFCE emerges as the result of uncoupled agent dynamics. In this paper, we give the first uncoupled noregret dynamics that converge to the set of EFCEs in nplayer generalsum extensiveform games with perfect recall. First, we introduce a notion of trigger regret in extensiveform games, which extends that of internal regret in normalform games. When each player has low trigger regret, the empirical frequency of playmore »

The existence of simple, uncoupled noregret dynamics that converge to correlated equilibria in normalform games is a celebrated result in the theory of multiagent systems. Specifically, it has been known for more than 20 years that when all players seek to minimize their internal regret in a repeated normalform game, the empirical frequency of play converges to a normalform correlated equilibrium. Extensiveform (that is, treeform) games generalize normalform games by modeling both sequential and simultaneous moves, as well as private information. Because of the sequential nature and presence of partial information in the game, extensiveform correlation has significantly different properties than the normalform counterpart, many of which are still open research directions. Extensiveform correlated equilibrium (EFCE) has been proposed as the natural extensiveform counterpart to normalform correlated equilibrium. However, it was currently unknown whether EFCE emerges as the result of uncoupled agent dynamics. In this paper, we give the first uncoupled noregret dynamics that converge to the set of EFCEs in nplayer generalsum extensiveform games with perfect recall. First, we introduce a notion of trigger regret in extensiveform games, which extends that of internal regret in normalform games. When each player has low trigger regret, the empirical frequency of playmore »

Jin, Shi (Ed.)In this paper, we apply the idea of fictitious play to design deep neural networks (DNNs), and develop deep learning theory and algorithms for computing the Nash equilibrium of asymmetric Nplayer nonzerosum stochastic differential games, for which we refer as deep fictitious play, a multistage learning process. Specifically at each stage, we propose the strategy of letting individual player optimize her own payoff subject to the other players’ previous actions, equivalent to solving N decoupled stochastic control optimization problems, which are approximated by DNNs. Therefore, the fictitious play strategy leads to a structure consisting of N DNNs, which only communicate at the end of each stage. The resulting deep learning algorithm based on fictitious play is scalable, parallel and modelfree, i.e., using GPU parallelization, it can be applied to any Nplayer stochastic differential game with different symmetries and heterogeneities (e.g., existence of major players). We illustrate the performance of the deep learning algorithm by comparing to the closedform solution of the linear quadratic game. Moreover, we prove the convergence of fictitious play under appropriate assumptions, and verify that the convergent limit forms an openloop Nash equilibrium. We also discuss the extensions to other strategies designed upon fictitious play and closedloopmore »