skip to main content

Title: Mean Field Contest with Singularity
We formulate a mean field game where each player stops a privately observed Brownian motion with absorption. Players are ranked according to their level of stopping and rewarded as a function of their relative rank. There is a unique mean field equilibrium, and it is shown to be the limit of associated n-player games. Conversely, the mean field strategy induces n-player ε-Nash equilibria for any continuous reward function—but not for discontinuous ones. In a second part, we study the problem of a principal who can choose how to distribute a reward budget over the ranks and aims to maximize the performance of the median player. The optimal reward design (contract) is found in closed form, complementing the merely partial results available in the n-player case. We then analyze the quality of the mean field design when used as a proxy for the optimizer in the n-player game. Surprisingly, the quality deteriorates dramatically as n grows. We explain this with an asymptotic singularity in the induced n-player equilibrium distributions. Funding: M. Nutz is supported by an Alfred P. Sloan Fellowship and the Division of Mathematical Sciences of the National Science Foundation [Grants DMS-1812661 and DMS-2106056]. Y. Zhang is supported in part by the Natural Sciences and Engineering Research Council of Canada [NSERC Discovery Grant RGPIN-2020-06290].  more » « less
Award ID(s):
2106056 1812661
Author(s) / Creator(s):
Date Published:
Journal Name:
Mathematics of Operations Research
Page Range / eLocation ID:
1095 to 1118
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Forcing finite state mean field games by a relevant form of common noise is a subtle issue, which has been addressed only recently. Among others, one possible way is to subject the simplex valued dynamics of an equilibrium by a so-called Wright–Fisher noise, very much in the spirit of stochastic models in population genetics. A key feature is that such a random forcing preserves the structure of the simplex, which is nothing but, in this setting, the probability space over the state space of the game. The purpose of this article is, hence, to elucidate the finite-player version and, accordingly, prove that N-player equilibria indeed converge toward the solution of such a kind of Wright–Fisher mean field game. Whereas part of the analysis is made easier by the fact that the corresponding master equation has already been proved to be uniquely solvable under the presence of the common noise, it becomes however more subtle than in the standard setting because the mean field interaction between the players now occurs through a weighted empirical measure. In other words, each player carries its own weight, which, hence, may differ from [Formula: see text] and which, most of all, evolves with the common noise. 
    more » « less
  2. Markov games model interactions among multiple players in a stochastic, dynamic environment. Each player in a Markov game maximizes its expected total discounted reward, which depends upon the policies of the other players. We formulate a class of Markov games, termed affine Markov games, where an affine reward function couples the players’ actions. We introduce a novel solution concept, the soft-Bellman equilibrium, where each player is boundedly rational and chooses a soft-Bellman policy rather than a purely rational policy as in the well-known Nash equilibrium concept. We provide conditions for the existence and uniqueness of the soft-Bellman equilibrium and propose a nonlinear least-squares algorithm to compute such an equilibrium in the forward problem. We then solve the inverse game problem of inferring the players’ reward parameters from observed state-action trajectories via a projected-gradient algorithm. Experiments in a predator-prey OpenAI Gym environment show that the reward parameters inferred by the proposed algorithm outper- form those inferred by a baseline algorithm: they reduce the Kullback-Leibler divergence between the equilibrium policies and observed policies by at least two orders of magnitude. 
    more » « less
  3. We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games with simultaneous moves. To incorporate function approximation, we consider a family of Markov games where the reward function and transition kernel possess a linear structure. Both the offline and online settings of the problems are considered. In the offline setting, we control both players and aim to find the Nash equilibrium by minimizing the duality gap. In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret. For both settings, we propose an optimistic variant of the least-squares minimax value iteration algorithm. We show that our algorithm is computationally efficient and provably achieves an [Formula: see text] upper bound on the duality gap and regret, where d is the linear dimension, H the horizon and T the total number of timesteps. Our results do not require additional assumptions on the sampling model. Our setting requires overcoming several new challenges that are absent in Markov decision processes or turn-based Markov games. In particular, to achieve optimism with simultaneous moves, we construct both upper and lower confidence bounds of the value function, and then compute the optimistic policy by solving a general-sum matrix game with these bounds as the payoff matrices. As finding the Nash equilibrium of a general-sum game is computationally hard, our algorithm instead solves for a coarse correlated equilibrium (CCE), which can be obtained efficiently. To our best knowledge, such a CCE-based scheme for optimism has not appeared in the literature and might be of interest in its own right. 
    more » « less
  4. We introduce and investigate certain N player dynamic games on the line and in the plane that admit Coulomb gas dynamics as a Nash equilibrium. Most significantly, we find that the universal local limit of the equilibrium is sensitive to the chosen model of player information in one dimension but not in two dimensions. We also find that players can achieve game theoretic symmetry through selfish behavior despite non-exchangeability of states, which allows us to establish strong localized convergence of the N-Nash systems to the expected mean field equations against locally optimal player ensembles, i.e., those exhibiting the same local limit as the Nash-optimal ensemble. In one dimension, this convergence notably features a nonlocal-to-local transition in the population dependence of the N-Nash system. 
    more » « less
  5. Yllka Velaj and Ulrich Berger (Ed.)

    This paper considers a two-player game where each player chooses a resource from a finite collection of options. Each resource brings a random reward. Both players have statistical information regarding the rewards of each resource. Additionally, there exists an information asymmetry where each player has knowledge of the reward realizations of different subsets of the resources. If both players choose the same resource, the reward is divided equally between them, whereas if they choose different resources, each player gains the full reward of the resource. We first implement the iterative best response algorithm to find an ϵ-approximate Nash equilibrium for this game. This method of finding a Nash equilibrium may not be desirable when players do not trust each other and place no assumptions on the incentives of the opponent. To handle this case, we solve the problem of maximizing the worst-case expected utility of the first player. The solution leads to counter-intuitive insights in certain special cases. To solve the general version of the problem, we develop an efficient algorithmic solution that combines online convex optimization and the drift-plus penalty technique.

    more » « less