Regret minimization has proved to be a versatile tool for tree- form sequential decision making and extensive-form games. In large two-player zero-sum imperfect-information games, mod- ern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium. Most regret-minimization algorithms for tree-form sequential decision making, including CFR, require (i) an exact model of the player’s decision nodes, observation nodes, and how they are linked, and (ii) full knowledge, at all times t, about the payoffs—even in parts of the decision space that are not encountered at time t. Recently, there has been growing interest towards relaxing some of those restric- tions and making regret minimization applicable to settings for which reinforcement learning methods have traditionally been used—for example, those in which only black-box access to the environment is available. We give the first, to our knowl- edge, regret-minimization algorithm that guarantees sublinear regret with high probability even when requirement (i)—and thus also (ii)—is dropped. We formalize an online learning setting in which the strategy space is not known to the agent and gets revealed incrementally whenever the agent encoun- ters new decision points. We give an efficient algorithm that achieves O(T 3/4)more »
Finding and Certifying (Near-)Optimal Strategies in Black-Box Extensive-Form Games
Often—for example in war games, strategy video games, and financial simulations—the game is given to us only as a black-box simulator in which we can play it. In these settings, since the game may have unknown nature action distributions (from which we can only obtain samples) and/or be too large to expand fully, it can be difficult to compute strategies with guarantees on exploitability. Recent work (Zhang and Sandholm 2020) resulted in a notion of certificate for extensive-form games that allows exploitability guarantees while not expanding the full game tree. However, that work assumed that the black box could sample or expand arbitrary nodes of the game tree at any time, and that a series of exact game solves (via, for example, linear programming) can be conducted to compute the certificate. Each of those two assumptions severely restricts the practical applicability of that method. In this work, we relax both of the assumptions. We show that high-probability certificates can be obtained with a black box that can do nothing more than play through games, using only a regret minimizer as a subroutine. As a bonus, we obtain an equilibrium-finding algorithm with ~O (1= p T) convergence rate in the extensive-form game setting that does not rely on a sampling strategy with lower-bounded reach probabilities (which MCCFR assumes). We demonstrate experimentally that, in the black-box setting, our more »
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Proceedings of the AAAI Conference on Artificial Intelligence
- Sponsoring Org:
- National Science Foundation
More Like this
Correlated Equilibrium is a solution concept that is more general than Nash Equilibrium (NE) and can lead to outcomes with better social welfare. However, its natural extension to the sequential setting, the Extensive Form Correlated Equilibrium (EFCE), requires a quadratic amount of space to solve, even in restricted settings without randomness in nature. To alleviate these concerns, we apply subgame resolving, a technique extremely successful in finding NE in zero-sum games to solving general-sum EFCEs. Subgame resolving refines a correlation plan in an online manner: instead of solving for the full game upfront, it only solves for strategies in subgames that are reached in actual play, resulting in significant computational gains. In this paper, we (i) lay out the foundations to quantify the quality of a refined strategy, in terms of the social welfare and exploitability of correlation plans, (ii) show that EFCEs possess a sufficient amount of independence between subgames to perform resolving efficiently, and (iii) provide two algorithms for resolving, one using linear programming and the other based on regret minimization. Both methods guarantee safety, i.e., they will never be counterproductive. Our methods are the first time an online method has been applied to the correlated, general-sum setting.
Hansen, Kristoffer Arnsfelt ; Liu, Tracy Xiao ; Malekian, Azarakhsh (Ed.)Empirical game-theoretic analysis (EGTA) is a general framework for reasoning about complex games using agent-based simulation. Data from simulating select strategy profiles is employed to estimate a cogent and tractable game model approximating the underlying game. To date, EGTA methodology has focused on game models in normal form; though the simulations play out in sequential observations and decisions over time, the game model abstracts away this temporal structure. Richer models of extensive-form games (EFGs) provide a means to capture temporal patterns in action and information, using tree representations. We propose tree-exploiting EGTA (TE-EGTA), an approach to incorporate EFG models into EGTA. TE-EGTA constructs game models that express observations and temporal organization of activity, albeit at a coarser grain than the underlying agent-based simulation model. The idea is to exploit key structure while maintaining tractability. We establish theoretically and experimentally that exploiting even a little temporal structure can vastly reduce estimation error in strategy-profile payoffs compared to the normal-form model. Further, we explore the implications of EFG models for iterative approaches to EGTA, where strategy spaces are extended incrementally. Our experiments on several game instances demonstrate that TE-EGTA can also improve performance in the iterative setting, as measured by the qualitymore »
In imperfect-information games, subgame solving is significantly more challenging than in perfect-information games, but in the last few years, such techniques have been developed. They were the key ingredient to the milestone of superhuman play in no-limit Texas hold'em poker. Current subgame-solving techniques analyze the entire common-knowledge closure of the player's current information set, that is, the smallest set of nodes within which it is common knowledge that the current node lies. However, this set is too large to handle in many games. We introduce an approach that overcomes this obstacle, by instead working with only low-order knowledge. Our approach allows an agent, upon arriving at an infoset, to basically prune any node that is no longer reachable, thereby massively reducing the game tree size relative to the common-knowledge subgame. We prove that, as is, our approach can increase exploitability compared to the blueprint strategy. However, we develop three avenues by which safety can be guaranteed. First, safety is guaranteed if the results of subgame solves are incorporated back into the blueprint. Second, we provide a method where safety is achieved by limiting the infosets at which subgame solving is performed. Third, we prove that our approach, when applied atmore »
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay, such as in financial or military simulations and computer games. During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well. After that, the algorithm has to produce a strategy that has low exploitability. Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly. For the stochastic game setting, we propose using the distribution of state-action value functions induced by a belief distribution over possible environments. We compare the performance of various exploration strategies for this task, including generalizations of Thompson sampling and Bayes-UCB to this new setting. These two consistently outperform other strategies.