NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games

Foster, D.; Golowich, N.; Kakade, S. (January 2023, Proceedings of the International Conference on Machine Learning)

Full Text Available
Smoothed online learning is as easy as statistical learning

Block, A; Dagan, Y; Golowich, N; Rakhlin, A (January 2022, Conference on Learning Theory)

Full Text Available
Tight last-iterate convergence rates for no-regret learning in multi-player games.

Golowich, N; Pattathil, S; Daskalakis, C (January 2020, 34th Annual Conference on Neural Information Processing Systems (NeurIPS), NeurIPS 2020)
null (Ed.)
https://arxiv.org/abs/2010.13724 We study the question of obtaining last-iterate convergence rates for no-regret learning algorithms in multi-player games. We show that the optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of O(1/T‾‾√) with respect to the gap function in smooth monotone games. This result addresses a question of Mertikopoulos & Zhou (2018), who asked whether extra-gradient approaches (such as OG) can be applied to achieve improved guarantees in the multi-agent learning setting. The proof of our upper bound uses a new technique centered around an adaptive choice of potential function at each iteration. We also show that the O(1/T‾‾√) rate is tight for all p-SCLI algorithms, which includes OG as a special case. As a byproduct of our lower bound analysis we additionally present a proof of a conjecture of Arjevani et al. (2015) which is more direct than previous approaches.
more » « less
Full Text Available
Independent Policy Gradient Methods for Competitive Reinforcement Learning

Daskalakis, C; Foster, D; Golowich, N (January 2020, 34th Annual Conference on Neural Information Processing Systems (NeurIPS), NeurIPS 2020)
null (Ed.)
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent policy gradient methods in competitive RL; prior work has largely focused on centralized, coordinated procedures for equilibrium computation.
more » « less
Full Text Available
Last Iterate is Slower than Averaged Iterate in Smooth Convex-Concave Saddle Point Problems

Golowich, N; Pattathil, S; Daskalakis, C; Ozdaglar, A (January 2020, 33nd Annual Conference on Learning Theory, COLT 2020)
null (Ed.)
In this paper we study the smooth convex-concave saddle point problem. Specifically, we analyze the last iterate convergence properties of the Extragradient (EG) algorithm. It is well known that the ergodic (averaged) iterates of EG converge at a rate of O(1/T) (Nemirovski, 2004). In this paper, we show that the last iterate of EG converges at a rate of O(1/T‾‾√). To the best of our knowledge, this is the first paper to provide a convergence rate guarantee for the last iterate of EG for the smooth convex-concave saddle point problem. Moreover, we show that this rate is tight by proving a lower bound of Ω(1/T‾‾√) for the last iterate. This lower bound therefore shows a quadratic separation of the convergence rates of ergodic and last iterates in smooth convex-concave saddle point problems.
more » « less
Full Text Available

Search for: All records