Independent Policy Gradient Methods for Competitive Reinforcement Learning

Daskalakis, C; Foster, D; Golowich, N

Citation Details

We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent policy gradient methods in competitive RL; prior work has largely focused on centralized, coordinated procedures for equilibrium computation. more »

Award ID(s):: 1741137

PAR ID:: 10228236

Author(s) / Creator(s):: Daskalakis, C; Foster, D; Golowich, N

Date Published:: 2020-01-01

Journal Name:: 34th Annual Conference on Neural Information Processing Systems (NeurIPS), NeurIPS 2020

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this