Markov games model interactions among multiple players in a stochastic, dynamic environment. Each player in a Markov game maximizes its expected total discounted reward, which depends upon the policies of the other players. We formulate a class of Markov games, termed affine Markov games, where an affine reward function couples the players’ actions. We introduce a novel solution concept, the soft-Bellman equilibrium, where each player is boundedly rational and chooses a soft-Bellman policy rather than a purely rational policy as in the well-known Nash equilibrium concept. We provide conditions for the existence and uniqueness of the soft-Bellman equilibrium and propose a nonlinear least-squares algorithm to compute such an equilibrium in the forward problem. We then solve the inverse game problem of inferring the players’ reward parameters from observed state-action trajectories via a projected-gradient algorithm. Experiments in a predator-prey OpenAI Gym environment show that the reward parameters inferred by the proposed algorithm outper- form those inferred by a baseline algorithm: they reduce the Kullback-Leibler divergence between the equilibrium policies and observed policies by at least two orders of magnitude.
more »
« less
Mean Field Contest with Singularity
We formulate a mean field game where each player stops a privately observed Brownian motion with absorption. Players are ranked according to their level of stopping and rewarded as a function of their relative rank. There is a unique mean field equilibrium, and it is shown to be the limit of associated n-player games. Conversely, the mean field strategy induces n-player ε-Nash equilibria for any continuous reward function—but not for discontinuous ones. In a second part, we study the problem of a principal who can choose how to distribute a reward budget over the ranks and aims to maximize the performance of the median player. The optimal reward design (contract) is found in closed form, complementing the merely partial results available in the n-player case. We then analyze the quality of the mean field design when used as a proxy for the optimizer in the n-player game. Surprisingly, the quality deteriorates dramatically as n grows. We explain this with an asymptotic singularity in the induced n-player equilibrium distributions. Funding: M. Nutz is supported by an Alfred P. Sloan Fellowship and the Division of Mathematical Sciences of the National Science Foundation [Grants DMS-1812661 and DMS-2106056]. Y. Zhang is supported in part by the Natural Sciences and Engineering Research Council of Canada [NSERC Discovery Grant RGPIN-2020-06290].
more »
« less
- PAR ID:
- 10427731
- Date Published:
- Journal Name:
- Mathematics of Operations Research
- Volume:
- 48
- Issue:
- 2
- ISSN:
- 0364-765X
- Page Range / eLocation ID:
- 1095 to 1118
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We introduce and investigate certain N player dynamic games on the line and in the plane that admit Coulomb gas dynamics as a Nash equilibrium. Most significantly, we find that the universal local limit of the equilibrium is sensitive to the chosen model of player information in one dimension but not in two dimensions. We also find that players can achieve game theoretic symmetry through selfish behavior despite non-exchangeability of states, which allows us to establish strong localized convergence of the N-Nash systems to the expected mean field equations against locally optimal player ensembles, i.e., those exhibiting the same local limit as the Nash-optimal ensemble. In one dimension, this convergence notably features a nonlocal-to-local transition in the population dependence of the N-Nash system.more » « less
-
Forcing finite state mean field games by a relevant form of common noise is a subtle issue, which has been addressed only recently. Among others, one possible way is to subject the simplex valued dynamics of an equilibrium by a so-called Wright–Fisher noise, very much in the spirit of stochastic models in population genetics. A key feature is that such a random forcing preserves the structure of the simplex, which is nothing but, in this setting, the probability space over the state space of the game. The purpose of this article is, hence, to elucidate the finite-player version and, accordingly, prove that N-player equilibria indeed converge toward the solution of such a kind of Wright–Fisher mean field game. Whereas part of the analysis is made easier by the fact that the corresponding master equation has already been proved to be uniquely solvable under the presence of the common noise, it becomes however more subtle than in the standard setting because the mean field interaction between the players now occurs through a weighted empirical measure. In other words, each player carries its own weight, which, hence, may differ from [Formula: see text] and which, most of all, evolves with the common noise.more » « less
-
arXiv:2402.05300v2 (Ed.)This paper considers a multi-player resource-sharing game with a fair reward allocation model. Multiple players choose from a collection of resources. Each resource brings a random reward equally divided among the players who choose it. We consider two settings. The first setting is a one-slot game where the mean rewards of the resources are known to all the players, and the objective of player 1 is to maximize their worst-case expected utility. Certain special cases of this setting have explicit solutions. These cases provide interesting yet non-intuitive insights into the problem. The second setting is an online setting, where the game is played over a finite time horizon, where the mean rewards are unknown to the first player. Instead, the first player receives, as feedback, the rewards of the resources they chose after the action. We develop a novel Upper Confidence Bound (UCB) algorithm that minimizes the worst-case regret of the first player using the feedback received.more » « less
-
We investigate a mean field game model for the production of exhaustible resources. In this model, firms produce comparable goods, strategically set their production rate in order to maximise profit, and leave the market as soon as they deplete their capacities. We examine the related Mean Field Game system and prove well-posedness for initial measure data by deriving suitable a priori estimates. Then, we show that feedback strategies which are computed from the Mean Field Game system provide ε -Nash equilibria to the corresponding N -Player Cournot game, for large values of N . This is done by showing tightness of the empirical process in the Skorokhod M 1 topology, which is defined for distribution-valued processes.more » « less