Stochastic differential games have been used extensively to model agents' competitions in finance, for instance, in P2P lending platforms from the Fintech industry, the banking system for systemic risk, and insurance markets. The recently proposed machine learning algorithm, deep fictitious play, provides a novel and efficient tool for finding Markovian Nash equilibrium of large
Deep Fictitious Play for Finding Markovian Nash Equilibrium in MultiAgent Games
We propose a deep neural networkbased algorithm to identify the Markovian Nash equilibrium of general large 𝑁player stochastic differential games. Following the idea of fictitious play, we recast the 𝑁player game into 𝑁 decoupled decision problems (one for each player) and solve them iteratively. The individual decision problem is characterized by a semilinear HamiltonJacobiBellman equation, to solve which we employ the recently developed deep BSDE method. The resulted algorithm can solve large 𝑁player games for which conventional numerical methods would suffer from the curse of dimensionality. Multiple numerical examples involving identical or heterogeneous agents, with riskneutral or risksensitive objectives, are tested to validate the accuracy of the proposed algorithm in large group games. Even for a fiftyplayer game with the presence of common noise, the proposed algorithm still finds the approximate Nash equilibrium accurately, which, to our best knowledge, is difficult to achieve by other numerical algorithms.
 Award ID(s):
 1953035
 Publication Date:
 NSFPAR ID:
 10253642
 Journal Name:
 Proceedings of Machine Learning Research
 Volume:
 107
 Page Range or eLocationID:
 221245
 ISSN:
 26403498
 Sponsoring Org:
 National Science Foundation
More Like this

player asymmetric stochastic differential games [J. Han and R. Hu, Mathematical and Scientific Machine Learning Conference, pages 221245, PMLR, 2020]. By incorporating the idea of fictitious play, the algorithm decouples the game into\begin{document}$ N $\end{document} suboptimization problems, and identifies each player's optimal strategy with the deep backward stochastic differential equation (BSDE) method parallelly and repeatedly. In this paper, we prove the convergence of deep fictitious play (DFP) to the true Nash equilibrium. We can also show that the strategy based on DFP forms an\begin{document}$ N $\end{document} Nash equilibrium. We generalize the algorithm by proposing a new approach to decouple the games, and present numerical results of large population games showing the empirical convergence of the algorithm beyond the technical assumptions in the theorems.\begin{document}$ \epsilon $\end{document} 
Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algo rithm for twoplayer zerosum games that has been empirically shown to find approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to an approximate Nash equilibrium and can handle continuous actions, it may take an exponential number of iterations as the number of information states (infostates) grows. We propose ExtensiveForm Double Oracle (XDO), an extensiveform double oracle algorithm for twoplayer zerosum games that is guar anteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an approximate Nash equilibrium in a number of iterations an order of magnitude smaller than PSRO. Experiments on a modified Leduc poker game and OshiZumo show that tabular XDO achieves a lower exploitability than CFR with the same amount of computation. We also find that NXDO outperforms PSRO and NFSP on a sequential multidimensional continuousaction game. NXDO is the first deep RLmore »

Unmanned Aerial Vehicle (UAV)assisted Multiaccess Edge Computing (MEC) systems have emerged recently as a flexible and dynamic computing environment, providing task offloading service to the users. In order for such a paradigm to be viable, the operator of a UAVmounted MEC server should enjoy some form of profit by offering its computing capabilities to the end users. To deal with this issue in this paper, we apply a usagebased pricing policy for allowing the exploitation of the servers’ computing resources. The proposed pricing mechanism implicitly introduces a more social behavior to the users with respect to competing for the UAVmounted MEC servers’ computation resources. In order to properly model the users’ riskaware behavior within the overall data offloading decisionmaking process the principles of Prospect Theory are adopted, while the exploitation of the available computation resources is considered based on the theory of the Tragedy of the Commons. Initially, the user’s prospecttheoretic utility function is formulated by quantifying the user’s risk seeking and loss aversion behavior, while taking into account the pricing mechanism. Accordingly, the users’ pricing and riskaware data offloading problem is formulated as a distributed maximization problem of each user’s expected prospecttheoretic utility function and addressed as a noncooperativemore »

We develop provably efficient reinforcement learning algorithms for twoplayer zerosum finitehorizon Markov games with simultaneous moves. To incorporate function approximation, we consider a family of Markov games where the reward function and transition kernel possess a linear structure. Both the offline and online settings of the problems are considered. In the offline setting, we control both players and aim to find the Nash equilibrium by minimizing the duality gap. In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret. For both settings, we propose an optimistic variant of the leastsquares minimax value iteration algorithm. We show that our algorithm is computationally efficient and provably achieves an [Formula: see text] upper bound on the duality gap and regret, where d is the linear dimension, H the horizon and T the total number of timesteps. Our results do not require additional assumptions on the sampling model. Our setting requires overcoming several new challenges that are absent in Markov decision processes or turnbased Markov games. In particular, to achieve optimism with simultaneous moves, we construct both upper and lower confidence bounds of the value function, and then compute the optimistic policy by solvingmore »

Jin, Shi (Ed.)In this paper, we apply the idea of fictitious play to design deep neural networks (DNNs), and develop deep learning theory and algorithms for computing the Nash equilibrium of asymmetric Nplayer nonzerosum stochastic differential games, for which we refer as deep fictitious play, a multistage learning process. Specifically at each stage, we propose the strategy of letting individual player optimize her own payoff subject to the other players’ previous actions, equivalent to solving N decoupled stochastic control optimization problems, which are approximated by DNNs. Therefore, the fictitious play strategy leads to a structure consisting of N DNNs, which only communicate at the end of each stage. The resulting deep learning algorithm based on fictitious play is scalable, parallel and modelfree, i.e., using GPU parallelization, it can be applied to any Nplayer stochastic differential game with different symmetries and heterogeneities (e.g., existence of major players). We illustrate the performance of the deep learning algorithm by comparing to the closedform solution of the linear quadratic game. Moreover, we prove the convergence of fictitious play under appropriate assumptions, and verify that the convergent limit forms an openloop Nash equilibrium. We also discuss the extensions to other strategies designed upon fictitious play and closedloopmore »