skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Complexity of Infinite-Horizon General-Sum Stochastic Games
We study the complexity of computing stationary Nash equilibrium (NE) in n-player infinite-horizon general-sum stochastic games. We focus on the problem of computing NE in such stochastic games when each player is restricted to choosing a stationary policy and rewards are discounted. First, we prove that computing such NE is in PPAD (in addition to clearly being PPAD-hard). Second, we consider turn-based specializations of such games where at each state there is at most a single player that can take actions and show that these (seemingly-simpler) games remain PPAD-hard. Third, we show that under further structural assumptions on the rewards computing NE in such turn-based games is possible in polynomial time. Towards achieving these results we establish structural facts about stochastic games of broader utility, including monotonicity of utilities under single-state single-action changes and reductions to settings where each player controls a single state.  more » « less
Award ID(s):
2239151
PAR ID:
10538681
Author(s) / Creator(s):
; ;
Editor(s):
Tauman_Kalai, Yael
Publisher / Repository:
Schloss Dagstuhl – Leibniz-Zentrum für Informatik
Date Published:
Volume:
251
ISSN:
1868-8969
ISBN:
978-3-95977-263-1
Page Range / eLocation ID:
251-251
Subject(s) / Keyword(s):
complexity stochastic games general-sum games Nash equilibrium Theory of computation → Complexity classes
Format(s):
Medium: X Size: 20 pages; 1274416 bytes Other: application/pdf
Size(s):
20 pages 1274416 bytes
Right(s):
Creative Commons Attribution 4.0 International license; info:eu-repo/semantics/openAccess
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    A two-player finite game is represented by two payoff matrices (A, B), one for each player. Imitation games are a subclass of two-player games in which B is the identity matrix, implying that the second player gets a positive payoff only if she "imitates" the first. Given that the problem of computing a Nash equilibrium (NE) is known to be provably hard, even to approximate, we ask if it is any easier for imitation games. We show that much like the general case, for any c > 0, computing a 1 over n^c -approximate NE of imitation games remains PPAD-hard, where n is the number of moves available to the players. On the other hand, we design a polynomial-time algorithm to find ε-approximate NE for any given constant ε > 0 (PTAS). The former result also rules out the smooth complexity being in P, unless PPAD ⊂ RP. 
    more » « less
  2. We study robust Markov games (RMG) with 𝑠-rectangular uncertainty. We show a general equivalence between computing a robust Nash equilibrium (RNE) of a 𝑠-rectangular RMG and computing a Nash equilibrium (NE) of an appropriately constructed regularized MG. The equivalence result yields a planning algorithm for solving 𝑠-rectangular RMGs, as well as provable robustness guarantees for policies computed using regularized methods. However, we show that even for just reward-uncertain two-player zero-sum matrix games, computing an RNE is PPAD-hard. Consequently, we derive a special uncertainty structure called efficient player-decomposability and show that RNE for two-player zero-sum RMG in this class can be provably solved in polynomial time. This class includes commonly used uncertainty sets such as 𝐿1 and 𝐿∞ ball uncertainty sets. 
    more » « less
  3. Policy gradient methods enjoy strong practical performance in numerous tasks in reinforcement learning. Their theoretical understanding in multiagent settings, however, remains limited, especially beyond two-player competitive and potential Markov games. In this paper, we develop a new framework to characterize optimistic policy gradient methods in multi-player Markov games with a single controller. Specifically, under the further assumption that the game exhibits an equilibrium collapse, in that the marginals of coarse correlated equilibria (CCE) induce Nash equilibria (NE), we show convergence to stationary ϵ-NE in O(1/ϵ2) iterations, where O(⋅) suppresses polynomial factors in the natural parameters of the game. Such an equilibrium collapse is well-known to manifest itself in two-player zero-sum Markov games, but also occurs even in a class of multi-player Markov games with separable interactions, as established by recent work. As a result, we bypass known complexity barriers for computing stationary NE when either of our assumptions fails. Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games. 
    more » « less
  4. Nash Equilibrium (NE) is the canonical solution concept of game theory, which provides an elegant tool to understand the rationalities. Though mixed strategy NE exists in any game with finite players and actions, computing NE in two- or multi-player general-sum games is PPAD-Complete. Various alternative solutions, e.g., Correlated Equilibrium (CE), and learning methods, e.g., fictitious play (FP), are proposed to approximate NE. For convenience, we call these methods as ``inexact solvers'', or ``solvers'' for short. However, the alternative solutions differ from NE and the learning methods generally fail to converge to NE. Therefore, in this work, we propose REinforcement Nash Equilibrium Solver (RENES), which trains a single policy to modify the games with different sizes and applies the solvers on the modified games where the obtained solution is evaluated on the original games. Specifically, our contributions are threefold. i) We represent the games as alpha-rank response graphs and leverage graph neural network (GNN) to handle the games with different sizes as inputs; ii) We use tensor decomposition, e.g., canonical polyadic (CP), to make the dimension of modifying actions fixed for games with different sizes; iii) We train the modifying strategy for games with the widely-used proximal policy optimization (PPO) and apply the solvers to solve the modified games, where the obtained solution is evaluated on original games. Extensive experiments on large-scale normal-form games show that our method can further improve the approximation of NE of different solvers, i.e., alpha-rank, CE, FP and PRD, and can be generalized to unseen games. 
    more » « less
  5. Multi-player games with lexicographic cost functions can capture a variety of driving and racing scenarios and are known to have pure-strategy Nash Equilibria (NE) under certain conditions. The standard Iterated Best Response (IBR) procedure for finding such equilibria can be slow because computing the best response for each agent generally involves solving a non-convex optimization problem. In this paper, we introduce a type of game which uses a lexicographic cost function. We show that for this class of games, the best responses can be effectively computed through piece-wise linear approximations. This enables us to approximate the NE using a linearized version of IBR. We show the gap between the linear approximations returned by our linearized IBR and the true best response drops asymptotically. We implement the algorithm and show that it can find approximate NE for a handful of agents driving in realistic scenarios in under 10 seconds. 
    more » « less