skip to main content


Title: Faster Algorithms for Optimal Ex-Ante Coordinated Collusive Strategies in Extensive-Form Zero-Sum Games
We focus on the problem of finding an optimal strategy for a team of two players that faces an opponent in an imperfect-information zero-sum extensive-form game. Team members are not allowed to communicate during play but can coordinate before the game. In that setting, it is known that the best the team can do is sample a profile of potentially randomized strategies (one per player) from a joint (a.k.a. correlated) probability distribution at the beginning of the game. In this paper, we first provide new modeling results about computing such an optimal distribution by drawing a connection to a different literature on extensive-form correlation. Second, we provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile. We can also cap the number of such profiles we allow in the solution. This begets an anytime algorithm by increasing the cap. We find that often a handful of well-chosen such profiles suffices to reach optimal utility for the team. This enables team members to reach coordination through a relatively simple and understandable plan. Finally, inspired by this observation and leveraging theoretical concepts that we introduce, we develop an efficient column-generation algorithm for finding an optimal distribution for the team. We evaluate it on a suite of common benchmark games. It is three orders of magnitude faster than the prior state of the art on games that the latter can solve and it can also solve several games that were previously unsolvable.  more » « less
Award ID(s):
1901403
NSF-PAR ID:
10288494
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ArXivorg
Volume:
arXiv:2009.10061v1
ISSN:
2331-8422
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    We focus on the problem of finding an optimal strategy for a team of two players that faces an opponent in an imperfect-information zero-sum extensive-form game. Team members are not allowed to communicate during play but can coordinate before the game. In that setting, it is known that the best the team can do is sample a profile of potentially randomized strategies (one per player) from a joint (a.k.a. correlated) probability distribution at the beginning of the game. In this paper, we first provide new modeling results about computing such an optimal distribution by drawing a connection to a different literature on extensive-form correlation. Second, we provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile. We can also cap the number of such profiles we allow in the solution. This begets an anytime algorithm by increasing the cap. We find that often a handful of well-chosen such profiles suffices to reach optimal utility for the team. This enables team members to reach coordination through a relatively simple and understandable plan. Finally, inspired by this observation and leveraging theoretical concepts that we introduce, we develop an efficient column-generation algorithm for finding an optimal distribution for the team. We evaluate it on a suite of common benchmark games. It is three orders of magnitude faster than the prior state of the art on games that the latter can solve and it can also solve several games that were previously unsolvable. 
    more » « less
  2. null (Ed.)
    We focus on the problem of finding an optimal strategy for a team of players that faces an opponent in an imperfect-information zero-sum extensive-form game. Team members are not allowed to communicate during play but can coordinate before the game. In this setting, it is known that the best the team can do is sample a profile of potentially randomized strategies (one per player) from a joint (a.k.a. correlated) probability distribution at the beginning of the game. In this paper, we first provide new modeling results about computing such an optimal distribution by drawing a connection to a different literature on extensive-form correlation. Second, we provide an algorithm that allows one for capping the number of profiles employed in the solution. This begets an anytime algorithm by increasing the cap. We find that often a handful of well-chosen such profiles suffices to reach optimal utility for the team. This enables team members to reach coordination through a simple and understandable plan. Finally, inspired by this observation and leveraging theoretical concepts that we introduce, we develop an efficient column-generation algorithm for finding an optimal distribution for the team. We evaluate it on a suite of common benchmark games. It is three orders of magnitude faster than the prior state of the art on games that the latter can solve and it can also solve several games that were previously unsolvable. 
    more » « less
  3. null (Ed.)
    Often—for example in war games, strategy video games, and financial simulations—the game is given to us only as a black-box simulator in which we can play it. In these settings, since the game may have unknown nature action distributions (from which we can only obtain samples) and/or be too large to expand fully, it can be difficult to compute strategies with guarantees on exploitability. Recent work (Zhang and Sandholm 2020) resulted in a notion of certificate for extensive-form games that allows exploitability guarantees while not expanding the full game tree. However, that work assumed that the black box could sample or expand arbitrary nodes of the game tree at any time, and that a series of exact game solves (via, for example, linear programming) can be conducted to compute the certificate. Each of those two assumptions severely restricts the practical applicability of that method. In this work, we relax both of the assumptions. We show that high-probability certificates can be obtained with a black box that can do nothing more than play through games, using only a regret minimizer as a subroutine. As a bonus, we obtain an equilibrium-finding algorithm with ~O (1= p T) convergence rate in the extensive-form game setting that does not rely on a sampling strategy with lower-bounded reach probabilities (which MCCFR assumes). We demonstrate experimentally that, in the black-box setting, our methods are able to provide nontrivial exploitability guarantees while expanding only a small fraction of the game tree. 
    more » « less
  4. Guruswami, Venkatesan (Ed.)
    A fundamental shortcoming of the concept of Nash equilibrium is its computational intractability: approximating Nash equilibria in normal-form games is PPAD-hard. In this paper, inspired by the ideas of smoothed analysis, we introduce a relaxed variant of Nash equilibrium called σ-smooth Nash equilibrium, for a {smoothness parameter} σ. In a σ-smooth Nash equilibrium, players only need to achieve utility at least as high as their best deviation to a σ-smooth strategy, which is a distribution that does not put too much mass (as parametrized by σ) on any fixed action. We distinguish two variants of σ-smooth Nash equilibria: strong σ-smooth Nash equilibria, in which players are required to play σ-smooth strategies under equilibrium play, and weak σ-smooth Nash equilibria, where there is no such requirement. We show that both weak and strong σ-smooth Nash equilibria have superior computational properties to Nash equilibria: when σ as well as an approximation parameter ϵ and the number of players are all constants, there is a {constant-time} randomized algorithm to find a weak ϵ-approximate σ-smooth Nash equilibrium in normal-form games. In the same parameter regime, there is a polynomial-time deterministic algorithm to find a strong ϵ-approximate σ-smooth Nash equilibrium in a normal-form game. These results stand in contrast to the optimal algorithm for computing ϵ-approximate Nash equilibria, which cannot run in faster than quasipolynomial-time, subject to complexity-theoretic assumptions. We complement our upper bounds by showing that when either σ or ϵ is an inverse polynomial, finding a weak ϵ-approximate σ-smooth Nash equilibria becomes computationally intractable. Our results are the first to propose a variant of Nash equilibrium which is computationally tractable, allows players to act independently, and which, as we discuss, is justified by an extensive line of work on individual choice behavior in the economics literature. 
    more » « less
  5. Although security games have attracted intensive research attention over the past years, few existing works consider how information from local communities would affect the game. In this paper, we introduce a new player -- a strategic informant, who can observe and report upcoming attacks -- to the defender-attacker security game setting. Characterized by a private type, the informant has his utility structure that leads to his strategic behaviors. We model the game as a 3-player extensive-form game and propose a novel solution concept of Strong Stackelberg-perfect Bayesian equilibrium. To compute the optimal defender strategy, we first show that although the informant can have infinitely many types in general, the optimal defense plan can only include a finite (exponential) number of different patrol strategies. We then prove that there exists a defense plan with only a linear number of patrol strategies that achieve the optimal defender's utility, which significantly reduces the computational burden and allows us to solve the game in polynomial time using linear programming. Finally, we conduct extensive experiments to show the effect of the strategic informant and demonstrate the effectiveness of our algorithm.

     
    more » « less