NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Hindsight and Sequential Rationality of Correlated Play

Morrill, D; D'Orazio, R; Sarfati, R; Lanctot, M; Wright, J.R.; Greenwald, A.R; Bowling, M. (May 2021, Proceedings of the AAAI Conference on Artificial Intelligence)
null (Ed.)
Driven by recent successes in two-player, zero-sum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibrium-based strategies. However, this approach has been less effective at producing competent players in general-sum games or those with more than two players than in two-player, zero-sum games. An appealing alternative is to consider adaptive algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior. This approach also leads to a game-theoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium. We develop and advocate for this hindsight rationality framing of learning in general sequential decision-making settings. To this end, we re-examine mediated equilibrium and deviation types in extensive-form games, thereby gaining a more complete understanding and resolving past misconceptions. We present a set of examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature, and prove that no tractable concept subsumes all others. This line of inquiry culminates in the definition of the deviation and equilibrium classes that correspond to algorithms in the counterfactual regret minimization (CFR) family, relating them to all others in the literature. Examining CFR in greater detail further leads to a new recursive definition of rationality in correlated play that extends sequential rationality in a way that naturally applies to hindsight evaluation.
more » « less
Full Text Available
Learning Competitive Equilibria in Noisy Combinatorial Markets

Areyan Viqueira, E; Cousins, C; Greenwald, A. (May 2021, AAMAS Conference proceedings)
null (Ed.)
We present a methodology to robustly estimate the competitive equilibria (CE) of combinatorial markets under the assumption that buyers do not know their precise valuations for bundles of goods, but instead can only provide noisy estimates. We first show tight lower- and upper-bounds on the buyers' utility loss, and hence the set of CE, given a uniform approximation of one market by another. We then present two probably-approximately-correct algorithms for learning CE with finite-sample guarantees. The first is a baseline and the second leverages a connection between the first welfare theorem of economics and uniform approximations to adaptively prune value queries when it is determined that they are provably not part of a CE. Extensive experimentation shows that pruning achieves better estimates than the baseline with far fewer samples.
more » « less
Full Text Available
Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

Morrill, D; D’Orazio, R; Lanctot, M; Wright, J.R.; Bowling, M; Greenwald, A.R. (January 2021, Proceedings of Machine Learning Research)
null (Ed.)
Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria. To develop hindsight rational learning in sequential decision-making settings, we formalize behavioral deviations as a general class of deviations that respect the structure of extensive-form games. Integrating the idea of time selection into counterfactual regret minimization (CFR), we introduce the extensive-form regret minimization (EFR) algorithm that achieves hindsight rationality for any given set of behavioral deviations with computation that scales closely with the complexity of the set. We identify behavioral deviation subsets, the partial sequence deviation types, that subsume previously studied types and lead to efficient EFR instances in games with moderate lengths. In addition, we present a thorough empirical analysis of EFR instantiated with different deviation types in benchmark games, where we find that stronger types typically induce better performance.
more » « less
Full Text Available

Search for: All records