NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property

https://doi.org/10.1609/aaai.v38i9.28799

Anagnostides, I; Panageas, I; Farina, G; Sandholm, T (February 2024, AAAI)

Full Text Available
Optimistic Policy Gradient in Multi-Player Markov Games with a Single Controller: Convergence Beyond the Minty Property

Anagnostides, I; Panageas, I; Farina, G; Sandholm, T (February 2024, AAAI24)

Policy gradient methods enjoy strong practical performance in numerous tasks in reinforcement learning. Their theoretical understanding in multiagent settings, however, remains limited, especially beyond two-player competitive and potential Markov games. In this paper, we develop a new framework to characterize optimistic policy gradient methods in multi-player Markov games with a single controller. Specifically, under the further assumption that the game exhibits an equilibrium collapse, in that the marginals of coarse correlated equilibria (CCE) induce Nash equilibria (NE), we show convergence to stationary ϵ-NE in O(1/ϵ2) iterations, where O(⋅) suppresses polynomial factors in the natural parameters of the game. Such an equilibrium collapse is well-known to manifest itself in two-player zero-sum Markov games, but also occurs even in a class of multi-player Markov games with separable interactions, as established by recent work. As a result, we bypass known complexity barriers for computing stationary NE when either of our assumptions fails. Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
more » « less
Full Text Available
On the Interplay between Social Welfare and Tractability of Equilibria

Anagnostides, I; Sandholm, T (December 2023, NeurIPS)

Full Text Available
On the Interplay between Social Welfare and Tractability of Equilibria

Anagnostides, I; Sandholm, T (December 2023, NeurIPS23)

Computational tractability and social welfare (aka. efficiency) of equilibria are two fundamental but in general orthogonal considerations in algorithmic game theory. Nevertheless, we show that when (approximate) full efficiency can be guaranteed via a smoothness argument à la Roughgarden, Nash equilibria are approachable under a family of no-regret learning algorithms, thereby enabling fast and decentralized computation. We leverage this connection to obtain new convergence results in large games—wherein the number of players n ≫ 1—under the well-documented property of full efficiency via smoothness in the limit. Surprisingly, our framework unifies equilibrium computation in disparate classes of problems including games with vanishing strategic sensitivity and two-player zero-sum games, illuminating en route an immediate but overlooked equivalence between smoothness and a well-studied condition in the optimization literature known as the Minty property. Finally, we establish that a family of no-regret dynamics attains a welfare bound that improves over the smoothness framework while at the same time guaranteeing convergence to the set of coarse correlated equilibria. We show this by employing the clairvoyant mirror descent algortihm recently introduced by Piliouras et al.
more » « less
Full Text Available
On the Complexity of Computing Sparse Equilibria and Lower Bounds for No-Regret Learning in Games

Anagnostides, I; Kalavasis, A; Sandholm, T; Zampetakis, M (February 2024, ITCS)

Full Text Available
On the Complexity of Computing Sparse Equilibria and Lower Bounds for No-Regret Learning in Games

Anagnostides, I; Kalavasis, A; Sandholm, T; Zampetakis, M (January 2024, ITCS24)

Characterizing the performance of no-regret dynamics in multi-player games is a foundational problem at the interface of online learning and game theory. Recent results have revealed that when all players adopt specific learning algorithms, it is possible to improve exponentially over what is predicted by the overly pessimistic no-regret framework in the traditional adversarial regime, thereby leading to faster convergence to the set of coarse correlated equilibria (CCE) – a standard game-theoretic equilibrium concept. Yet, despite considerable recent progress, the fundamental complexity barriers for learning in normal- and extensive-form games are poorly understood. In this paper, we make a step towards closing this gap by first showing that – barring major complexity breakthroughs – any polynomial-time learning algorithms in extensive-form games need at least 2log1/2−o(1) |T | iterations for the average regret to reach below even an absolute constant, where |T | is the number of nodes in the game. This establishes a superpolynomial separation between no-regret learning in normal- and extensive-form games, as in the former class a logarithmic number of iterations suffices to achieve constant average regret. Furthermore, our results imply that algorithms such as multiplicative weights update, as well as its optimistic counterpart, require at least 2(log logm)1/2−o(1) iterations to attain an O(1)-CCE in m-action normal-form games under any parameterization. These are the first non-trivial – and dimension-dependent – lower bounds in that setting for the most well-studied algorithms in the literature. From a technical standpoint, we follow a beautiful connection recently made by Foster, Golowich, and Kakade (ICML ’23) between sparse CCE and Nash equilibria in the context of Markov games. Consequently, our lower bounds rule out polynomial-time algorithms well beyond the traditional online learning framework, capturing techniques commonly used for accelerating centralized equilibrium computation.
more » « less
Full Text Available
On the Convergence of No-Regret Learning Dynamics in Time-Varying Games

Anagnostides, I; Panageas, I; Farina, G; Sandholm, T (December 2023, NeurIPS23)

Most of the literature on learning in games has focused on the restrictive setting where the underlying repeated game does not change over time. Much less is known about the convergence of no-regret learning algorithms in dynamic multiagent settings. In this paper, we characterize the convergence of optimistic gradient descent (OGD) in time-varying games. Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games parameterized on natural variation measures of the sequence of games, subsuming known results for static games. Furthermore, we establish improved second-order variation bounds under strong convexity-concavity, as long as each game is repeated multiple times. Our results also extend to time-varying general-sum multi-player games via a bilinear formulation of correlated equilibria, which has novel implications for meta-learning and for obtaining refined variation-dependent regret bounds, addressing questions left open in prior papers. Finally, we leverage our framework to also provide new insights on dynamic regret guarantees in static games. 1
more » « less
Full Text Available
Near-Optimal 0-Regret Learning in Extensive-Form Games

Anagnostides, I.; Farina, G.; Sandholm, T. (July 2023, ICML-23)

Full Text Available
Computing Optimal Equilibria and Mechanisms via Learning in Zero-Sum Extensive-Form Games

Zhang, B; Farina, G; Anagnostides, I; Cacciamani, F; McAleer, S; Haupt, A; Celli, A; Gatti, N; Conitzer, V; Sandholm, T (December 2023, NeurIPS)

Full Text Available
Computing Optimal Equilibria and Mechanisms via Learning in Zero-Sum Extensive-Form Games

Zhang, B; Farina, G; Anagnostides, I; Cacciamani, F; McAleer, S; Haupt, A; Celli, A; Gatti, N; Conitzer, V; Sandholm, T (December 2023, NeurIPS23)

We introduce a new approach for computing optimal equilibria and mechanisms via learning in games. It applies to extensive-form settings with any number of players, including mechanism design, information design, and solution concepts such as correlated, communication, and certification equilibria. We observe that optimal equilibria are minimax equilibrium strategies of a player in an extensiveform zero-sum game. This reformulation allows us to apply techniques for learning in zero-sum games, yielding the first learning dynamics that converge to optimal equilibria, not only in empirical averages, but also in iterates. We demonstrate the practical scalability and flexibility of our approach by attaining state-of-the-art performance in benchmark tabular games, and by computing an optimal mechanism for a sequential auction design problem using deep reinforcement learning.
more » « less
Full Text Available

« Prev Next »

Search for: All records