 NSFPAR ID:
 10427731
 Date Published:
 Journal Name:
 Mathematics of Operations Research
 Volume:
 48
 Issue:
 2
 ISSN:
 0364765X
 Page Range / eLocation ID:
 1095 to 1118
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Finite State Mean Field Games with WrightFisher Common Noise as Limits of $N$Player Weighted GamesForcing finite state mean field games by a relevant form of common noise is a subtle issue, which has been addressed only recently. Among others, one possible way is to subject the simplex valued dynamics of an equilibrium by a socalled Wright–Fisher noise, very much in the spirit of stochastic models in population genetics. A key feature is that such a random forcing preserves the structure of the simplex, which is nothing but, in this setting, the probability space over the state space of the game. The purpose of this article is, hence, to elucidate the finiteplayer version and, accordingly, prove that Nplayer equilibria indeed converge toward the solution of such a kind of Wright–Fisher mean field game. Whereas part of the analysis is made easier by the fact that the corresponding master equation has already been proved to be uniquely solvable under the presence of the common noise, it becomes however more subtle than in the standard setting because the mean field interaction between the players now occurs through a weighted empirical measure. In other words, each player carries its own weight, which, hence, may differ from [Formula: see text] and which, most of all, evolves with the common noise.more » « less

Markov games model interactions among multiple players in a stochastic, dynamic environment. Each player in a Markov game maximizes its expected total discounted reward, which depends upon the policies of the other players. We formulate a class of Markov games, termed affine Markov games, where an affine reward function couples the players’ actions. We introduce a novel solution concept, the softBellman equilibrium, where each player is boundedly rational and chooses a softBellman policy rather than a purely rational policy as in the wellknown Nash equilibrium concept. We provide conditions for the existence and uniqueness of the softBellman equilibrium and propose a nonlinear leastsquares algorithm to compute such an equilibrium in the forward problem. We then solve the inverse game problem of inferring the players’ reward parameters from observed stateaction trajectories via a projectedgradient algorithm. Experiments in a predatorprey OpenAI Gym environment show that the reward parameters inferred by the proposed algorithm outper form those inferred by a baseline algorithm: they reduce the KullbackLeibler divergence between the equilibrium policies and observed policies by at least two orders of magnitude.more » « less

We develop provably efficient reinforcement learning algorithms for twoplayer zerosum finitehorizon Markov games with simultaneous moves. To incorporate function approximation, we consider a family of Markov games where the reward function and transition kernel possess a linear structure. Both the offline and online settings of the problems are considered. In the offline setting, we control both players and aim to find the Nash equilibrium by minimizing the duality gap. In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret. For both settings, we propose an optimistic variant of the leastsquares minimax value iteration algorithm. We show that our algorithm is computationally efficient and provably achieves an [Formula: see text] upper bound on the duality gap and regret, where d is the linear dimension, H the horizon and T the total number of timesteps. Our results do not require additional assumptions on the sampling model. Our setting requires overcoming several new challenges that are absent in Markov decision processes or turnbased Markov games. In particular, to achieve optimism with simultaneous moves, we construct both upper and lower confidence bounds of the value function, and then compute the optimistic policy by solving a generalsum matrix game with these bounds as the payoff matrices. As finding the Nash equilibrium of a generalsum game is computationally hard, our algorithm instead solves for a coarse correlated equilibrium (CCE), which can be obtained efficiently. To our best knowledge, such a CCEbased scheme for optimism has not appeared in the literature and might be of interest in its own right.more » « less

We introduce and investigate certain N player dynamic games on the line and in the plane that admit Coulomb gas dynamics as a Nash equilibrium. Most significantly, we find that the universal local limit of the equilibrium is sensitive to the chosen model of player information in one dimension but not in two dimensions. We also find that players can achieve game theoretic symmetry through selfish behavior despite nonexchangeability of states, which allows us to establish strong localized convergence of the NNash systems to the expected mean field equations against locally optimal player ensembles, i.e., those exhibiting the same local limit as the Nashoptimal ensemble. In one dimension, this convergence notably features a nonlocaltolocal transition in the population dependence of the NNash system.more » « less

Yllka Velaj and Ulrich Berger (Ed.)
This paper considers a twoplayer game where each player chooses a resource from a finite collection of options. Each resource brings a random reward. Both players have statistical information regarding the rewards of each resource. Additionally, there exists an information asymmetry where each player has knowledge of the reward realizations of different subsets of the resources. If both players choose the same resource, the reward is divided equally between them, whereas if they choose different resources, each player gains the full reward of the resource. We first implement the iterative best response algorithm to find an ϵapproximate Nash equilibrium for this game. This method of finding a Nash equilibrium may not be desirable when players do not trust each other and place no assumptions on the incentives of the opponent. To handle this case, we solve the problem of maximizing the worstcase expected utility of the first player. The solution leads to counterintuitive insights in certain special cases. To solve the general version of the problem, we develop an efficient algorithmic solution that combines online convex optimization and the driftplus penalty technique.