skip to main content

Title: Safe Subgame Resolving for Extensive Form Correlated Equilibrium
Correlated Equilibrium is a solution concept that is more general than Nash Equilibrium (NE) and can lead to outcomes with better social welfare. However, its natural extension to the sequential setting, the Extensive Form Correlated Equilibrium (EFCE), requires a quadratic amount of space to solve, even in restricted settings without randomness in nature. To alleviate these concerns, we apply subgame resolving, a technique extremely successful in finding NE in zero-sum games to solving general-sum EFCEs. Subgame resolving refines a correlation plan in an online manner: instead of solving for the full game upfront, it only solves for strategies in subgames that are reached in actual play, resulting in significant computational gains. In this paper, we (i) lay out the foundations to quantify the quality of a refined strategy, in terms of the social welfare and exploitability of correlation plans, (ii) show that EFCEs possess a sufficient amount of independence between subgames to perform resolving efficiently, and (iii) provide two algorithms for resolving, one using linear programming and the other based on regret minimization. Both methods guarantee safety, i.e., they will never be counterproductive. Our methods are the first time an online method has been applied to the correlated, general-sum setting.
Award ID(s):
Publication Date:
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
Page Range or eLocation-ID:
5116 to 5123
Sponsoring Org:
National Science Foundation
More Like this
  1. We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games with simultaneous moves. To incorporate function approximation, we consider a family of Markov games where the reward function and transition kernel possess a linear structure. Both the offline and online settings of the problems are considered. In the offline setting, we control both players and aim to find the Nash equilibrium by minimizing the duality gap. In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret. For both settings, we propose an optimistic variant of the least-squares minimax value iteration algorithm. We show that our algorithm is computationally efficient and provably achieves an [Formula: see text] upper bound on the duality gap and regret, where d is the linear dimension, H the horizon and T the total number of timesteps. Our results do not require additional assumptions on the sampling model. Our setting requires overcoming several new challenges that are absent in Markov decision processes or turn-based Markov games. In particular, to achieve optimism with simultaneous moves, we construct both upper and lower confidence bounds of the value function, and then compute the optimistic policy by solvingmore »a general-sum matrix game with these bounds as the payoff matrices. As finding the Nash equilibrium of a general-sum game is computationally hard, our algorithm instead solves for a coarse correlated equilibrium (CCE), which can be obtained efficiently. To our best knowledge, such a CCE-based scheme for optimism has not appeared in the literature and might be of interest in its own right.« less
  2. Driven by recent successes in two-player, zero-sum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibrium-based strategies. However, this approach has been less effective at producing competent players in general-sum games or those with more than two players than in two-player, zero-sum games. An appealing alternative is to consider adaptive algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior. This approach also leads to a game-theoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium. We develop and advocate for this hindsight rationality framing of learning in general sequential decision-making settings. To this end, we re-examine mediated equilibrium and deviation types in extensive-form games, thereby gaining a more complete understanding and resolving past misconceptions. We present a set of examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature, and prove that no tractable concept subsumes all others. This line of inquiry culminates in the definition of the deviation and equilibrium classes that correspond to algorithms in the counterfactual regret minimization (CFR) family, relating them to all others inmore »the literature. Examining CFR in greater detail further leads to a new recursive definition of rationality in correlated play that extends sequential rationality in a way that naturally applies to hindsight evaluation.« less
  3. Price discrimination strategies, which offer different prices to customers based on differences in their valuations, have become common practice. Although it allows sellers to increase their profits, it also raises several concerns in terms of fairness (e.g., by charging higher prices (or denying access) to protected minorities in case they have higher (or lower) valuations than the general population). This topic has received extensive attention from media, industry, and regulatory agencies. In this paper, we consider the problem of setting prices for different groups under fairness constraints. We first propose four definitions: fairness in price, demand, consumer surplus, and no-purchase valuation. We prove that satisfying more than one of these fairness constraints is impossible even under simple settings. We then analyze the pricing strategy of a profit-maximizing seller and the impact of imposing fairness on the seller’s profit, consumer surplus, and social welfare. Under a linear demand model, we find that imposing a small amount of price fairness increases social welfare, whereas too much price fairness may result in a lower welfare relative to imposing no fairness. On the other hand, imposing fairness in demand or consumer surplus always decreases social welfare. Finally, no-purchase valuation fairness always increases social welfare.more »We observe similar patterns under several extensions and for other common demand models numerically. Our results and insights provide a first step in understanding the impact of imposing fairness in the context of discriminatory pricing. This paper was accepted by Jayashankar Swaminathan, operations management. Funding: A. N. Elmachtoub was supported by the Division of Civil, Mechanical and Manufacturing Innovation [Grants 1763000 and 1944428]. Supplemental Material: The data files and online appendix are available at .« less
  4. In this work we consider online decision-making in settings where players want to guard against possible adversarial attacks or other catastrophic failures. To address this, we propose a solution concept in which players have an additional constraint that at each time step they must play a diversified mixed strategy: one that does not put too much weight on any one action. This constraint is motivated by applications such as finance, routing, and resource allocation, where one would like to limit one’s exposure to adversarial or catastrophic events while still performing well in typical cases. We explore properties of diversified strategies in both zero-sum and general-sum games, and provide algorithms for minimizing regret within the family of diversified strategies as well as methods for using taxes or fees to guide standard regret-minimizing players towards diversified strategies. We also analyze equilibria produced by diversified strategies in general-sum games. We show that surprisingly, requiring diversification can actually lead to higher-welfare equilibria, and give strong guarantees on both price of anarchy and the social welfare produced by regret-minimizing diversified agents. We additionally give algorithms for finding optimal diversified strategies in distributed settings where one must limit communication overhead.
  5. Regret minimization has proved to be a versatile tool for tree- form sequential decision making and extensive-form games. In large two-player zero-sum imperfect-information games, mod- ern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium. Most regret-minimization algorithms for tree-form sequential decision making, including CFR, require (i) an exact model of the player’s decision nodes, observation nodes, and how they are linked, and (ii) full knowledge, at all times t, about the payoffs—even in parts of the decision space that are not encountered at time t. Recently, there has been growing interest towards relaxing some of those restric- tions and making regret minimization applicable to settings for which reinforcement learning methods have traditionally been used—for example, those in which only black-box access to the environment is available. We give the first, to our knowl- edge, regret-minimization algorithm that guarantees sublinear regret with high probability even when requirement (i)—and thus also (ii)—is dropped. We formalize an online learning setting in which the strategy space is not known to the agent and gets revealed incrementally whenever the agent encoun- ters new decision points. We give an efficient algorithm that achieves O(T 3/4)more »regret with high probability for that setting, even when the agent faces an adversarial environment. Our experiments show it significantly outperforms the prior algo- rithms for the problem, which do not have such guarantees. It can be used in any application for which regret minimization is useful: approximating Nash equilibrium or quantal response equilibrium, approximating coarse correlated equilibrium in multi-player games, learning a best response, learning safe opponent exploitation, and online play against an unknown opponent/environment.« less