 Award ID(s):
 1901403
 NSFPAR ID:
 10288482
 Date Published:
 Journal Name:
 Proceedings of the 38th International Conference on Machine Learning
 Page Range / eLocation ID:
 31643173
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

null (Ed.)We focus on the problem of finding an optimal strategy for a team of two players that faces an opponent in an imperfectinformation zerosum extensiveform game. Team members are not allowed to communicate during play but can coordinate before the game. In that setting, it is known that the best the team can do is sample a profile of potentially randomized strategies (one per player) from a joint (a.k.a. correlated) probability distribution at the beginning of the game. In this paper, we first provide new modeling results about computing such an optimal distribution by drawing a connection to a different literature on extensiveform correlation. Second, we provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile. We can also cap the number of such profiles we allow in the solution. This begets an anytime algorithm by increasing the cap. We find that often a handful of wellchosen such profiles suffices to reach optimal utility for the team. This enables team members to reach coordination through a relatively simple and understandable plan. Finally, inspired by this observation and leveraging theoretical concepts that we introduce, we develop an efficient columngeneration algorithm for finding an optimal distribution for the team. We evaluate it on a suite of common benchmark games. It is three orders of magnitude faster than the prior state of the art on games that the latter can solve and it can also solve several games that were previously unsolvable.more » « less

null (Ed.)We focus on the problem of finding an optimal strategy for a team of two players that faces an opponent in an imperfectinformation zerosum extensiveform game. Team members are not allowed to communicate during play but can coordinate before the game. In that setting, it is known that the best the team can do is sample a profile of potentially randomized strategies (one per player) from a joint (a.k.a. correlated) probability distribution at the beginning of the game. In this paper, we first provide new modeling results about computing such an optimal distribution by drawing a connection to a different literature on extensiveform correlation. Second, we provide an algorithm that computes such an optimal distribution by only using profiles where only one of the team members gets to randomize in each profile. We can also cap the number of such profiles we allow in the solution. This begets an anytime algorithm by increasing the cap. We find that often a handful of wellchosen such profiles suffices to reach optimal utility for the team. This enables team members to reach coordination through a relatively simple and understandable plan. Finally, inspired by this observation and leveraging theoretical concepts that we introduce, we develop an efficient columngeneration algorithm for finding an optimal distribution for the team. We evaluate it on a suite of common benchmark games. It is three orders of magnitude faster than the prior state of the art on games that the latter can solve and it can also solve several games that were previously unsolvable.more » « less

null (Ed.)Often—for example in war games, strategy video games, and financial simulations—the game is given to us only as a blackbox simulator in which we can play it. In these settings, since the game may have unknown nature action distributions (from which we can only obtain samples) and/or be too large to expand fully, it can be difficult to compute strategies with guarantees on exploitability. Recent work (Zhang and Sandholm 2020) resulted in a notion of certificate for extensiveform games that allows exploitability guarantees while not expanding the full game tree. However, that work assumed that the black box could sample or expand arbitrary nodes of the game tree at any time, and that a series of exact game solves (via, for example, linear programming) can be conducted to compute the certificate. Each of those two assumptions severely restricts the practical applicability of that method. In this work, we relax both of the assumptions. We show that highprobability certificates can be obtained with a black box that can do nothing more than play through games, using only a regret minimizer as a subroutine. As a bonus, we obtain an equilibriumfinding algorithm with ~O (1= p T) convergence rate in the extensiveform game setting that does not rely on a sampling strategy with lowerbounded reach probabilities (which MCCFR assumes). We demonstrate experimentally that, in the blackbox setting, our methods are able to provide nontrivial exploitability guarantees while expanding only a small fraction of the game tree.more » « less

Guruswami, Venkatesan (Ed.)A fundamental shortcoming of the concept of Nash equilibrium is its computational intractability: approximating Nash equilibria in normalform games is PPADhard. In this paper, inspired by the ideas of smoothed analysis, we introduce a relaxed variant of Nash equilibrium called σsmooth Nash equilibrium, for a {smoothness parameter} σ. In a σsmooth Nash equilibrium, players only need to achieve utility at least as high as their best deviation to a σsmooth strategy, which is a distribution that does not put too much mass (as parametrized by σ) on any fixed action. We distinguish two variants of σsmooth Nash equilibria: strong σsmooth Nash equilibria, in which players are required to play σsmooth strategies under equilibrium play, and weak σsmooth Nash equilibria, where there is no such requirement. We show that both weak and strong σsmooth Nash equilibria have superior computational properties to Nash equilibria: when σ as well as an approximation parameter ϵ and the number of players are all constants, there is a {constanttime} randomized algorithm to find a weak ϵapproximate σsmooth Nash equilibrium in normalform games. In the same parameter regime, there is a polynomialtime deterministic algorithm to find a strong ϵapproximate σsmooth Nash equilibrium in a normalform game. These results stand in contrast to the optimal algorithm for computing ϵapproximate Nash equilibria, which cannot run in faster than quasipolynomialtime, subject to complexitytheoretic assumptions. We complement our upper bounds by showing that when either σ or ϵ is an inverse polynomial, finding a weak ϵapproximate σsmooth Nash equilibria becomes computationally intractable. Our results are the first to propose a variant of Nash equilibrium which is computationally tractable, allows players to act independently, and which, as we discuss, is justified by an extensive line of work on individual choice behavior in the economics literature.more » « less

Although security games have attracted intensive research attention over the past years, few existing works consider how information from local communities would affect the game. In this paper, we introduce a new player  a strategic informant, who can observe and report upcoming attacks  to the defenderattacker security game setting. Characterized by a private type, the informant has his utility structure that leads to his strategic behaviors. We model the game as a 3player extensiveform game and propose a novel solution concept of Strong Stackelbergperfect Bayesian equilibrium. To compute the optimal defender strategy, we first show that although the informant can have infinitely many types in general, the optimal defense plan can only include a finite (exponential) number of different patrol strategies. We then prove that there exists a defense plan with only a linear number of patrol strategies that achieve the optimal defender's utility, which significantly reduces the computational burden and allows us to solve the game in polynomial time using linear programming. Finally, we conduct extensive experiments to show the effect of the strategic informant and demonstrate the effectiveness of our algorithm.