skip to main content

Title: Dynamic Games Among Teams with Delayed Intra-Team Information Sharing
We analyze a class of stochastic dynamic games among teams with asymmetric information, where members of a team share their observations internally with a delay of d. Each team is associated with a controlled Markov Chain, whose dynamics are coupled through the players’ actions. These games exhibit challenges in both theory and practice due to the presence of signaling and the increasing domain of information over time. We develop a general approach to characterize a subset of Nash equilibria where the agents can use a compressed version of their information, instead of the full information, to choose their actions. We identify two subclasses of strategies: sufficient private information-Based (SPIB) strategies, which only compress private information, and compressed information-based (CIB) strategies, which compress both common and private information. We show that SPIB-strategy-based equilibria exist and the set of payoff profiles of such equilibria is the same as that of all Nash equilibria. On the other hand, we show that CIB-strategy-based equilibria may not exist. We develop a backward inductive sequential procedure, whose solution (if it exists) provides a CIB strategy-based equilibrium. We identify some instances where we can guarantee the existence of a solution to the above procedure. Our results highlight the tension among compression of information, ability of compression-based strategies to sustain all or some of the equilibrium payoff profiles, and backward inductive sequential computation of equilibria in stochastic dynamic games with asymmetric information.  more » « less
Award ID(s):
2038416 2008130 1608361
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Dynamic Games and Applications
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Jin, Shi (Ed.)
    In this paper, we apply the idea of fictitious play to design deep neural networks (DNNs), and develop deep learning theory and algorithms for computing the Nash equilibrium of asymmetric N-player non-zero-sum stochastic differential games, for which we refer as deep fictitious play, a multi-stage learning process. Specifically at each stage, we propose the strategy of letting individual player optimize her own payoff subject to the other players’ previous actions, equivalent to solving N decoupled stochastic control optimization problems, which are approximated by DNNs. Therefore, the fictitious play strategy leads to a structure consisting of N DNNs, which only communicate at the end of each stage. The resulting deep learning algorithm based on fictitious play is scalable, parallel and model-free, i.e., using GPU parallelization, it can be applied to any N-player stochastic differential game with different symmetries and heterogeneities (e.g., existence of major players). We illustrate the performance of the deep learning algorithm by comparing to the closed-form solution of the linear quadratic game. Moreover, we prove the convergence of fictitious play under appropriate assumptions, and verify that the convergent limit forms an open-loop Nash equilibrium. We also discuss the extensions to other strategies designed upon fictitious play and closed-loop Nash equilibrium in the end. 
    more » « less
  2. We extend Kreps and Wilson's concept of sequential equilibrium to games with infinite sets of signals and actions. A strategy profile is a conditional ε ‐equilibrium if, for any of a player's positive probability signal events, his conditional expected utility is within ε of the best that he can achieve by deviating. With topologies on action sets, a conditional ε ‐equilibrium is full if strategies give every open set of actions positive probability. Such full conditional ε ‐equilibria need not be subgame perfect, so we consider a non‐topological approach. Perfect conditional ε ‐equilibria are defined by testing conditional ε ‐rationality along nets of small perturbations of the players' strategies and of nature's probability function that, for any action and for almost any state, make this action and state eventually (in the net) always have positive probability. Every perfect conditional ε ‐equilibrium is a subgame perfect ε ‐equilibrium, and, in finite games, limits of perfect conditional ε ‐equilibria as ε  → 0 are sequential equilibrium strategy profiles. But limit strategies need not exist in infinite games so we consider instead the limit distributions over outcomes. We call such outcome distributions perfect conditional equilibrium distributions and establish their existence for a large class of regular projective games. Nature's perturbations can produce equilibria that seem unintuitive and so we augment the game with a net of permissible perturbations. 
    more » « less
  3. Stochastic differential games have been used extensively to model agents' competitions in finance, for instance, in P2P lending platforms from the Fintech industry, the banking system for systemic risk, and insurance markets. The recently proposed machine learning algorithm, deep fictitious play, provides a novel and efficient tool for finding Markovian Nash equilibrium of large \begin{document}$ N $\end{document}-player asymmetric stochastic differential games [J. Han and R. Hu, Mathematical and Scientific Machine Learning Conference, pages 221-245, PMLR, 2020]. By incorporating the idea of fictitious play, the algorithm decouples the game into \begin{document}$ N $\end{document} sub-optimization problems, and identifies each player's optimal strategy with the deep backward stochastic differential equation (BSDE) method parallelly and repeatedly. In this paper, we prove the convergence of deep fictitious play (DFP) to the true Nash equilibrium. We can also show that the strategy based on DFP forms an \begin{document}$ \epsilon $\end{document}-Nash equilibrium. We generalize the algorithm by proposing a new approach to decouple the games, and present numerical results of large population games showing the empirical convergence of the algorithm beyond the technical assumptions in the theorems.

    more » « less
  4. Network games are commonly used to capture the strategic interactions among interconnected agents in simultaneous moves. The agents’ actions in a Nash equilibrium must take into account the mutual dependencies connecting them, which is typically obtained by solving a set of fixed point equations. Stackelberg games, on the other hand, model the sequential moves between agents that are categorized as leaders and followers. The corresponding solution concept, the subgame perfect equilibrium, is typically obtained using backward induction. Both game forms enjoy very wide use in the (cyber)security literature, the network game often as a template to study security investment and externality – also referred to as the Interdependent Security (IDS) games – and the Stackelberg game as a formalism to model a variety of attacker-defender scenarios. In this study we examine a model that combines both types of strategic reasoning: the interdependency as well as sequential moves. Specifically, we consider a scenario with a network of interconnected first movers (firms or defenders, whose security efforts and practices collectively determine the security posture of the eco-system) and one or more second movers, the attacker(s), who determine how much effort to exert on attacking the many potential targets. This gives rise to an equilibrium concept that embodies both types of equilibria mentioned above. We will examine how its existence and uniqueness conditions differ from that for a standard network game. Of particular interest are comparisons between the two game forms in terms of effort exerted by the defender(s) and the attacker(s), respectively, and the free-riding behavior among the defenders. 
    more » « less
  5. Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algo- rithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to an approximate Nash equilibrium and can handle continuous actions, it may take an exponential number of iterations as the number of information states (infostates) grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm for two-player zero-sum games that is guar- anteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an approximate Nash equilibrium in a number of iterations an order of magnitude smaller than PSRO. Experiments on a modified Leduc poker game and Oshi-Zumo show that tabular XDO achieves a lower exploitability than CFR with the same amount of computation. We also find that NXDO outperforms PSRO and NFSP on a sequential multidimensional continuous-action game. NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games. Experiment code is available at 
    more » « less