- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources5
- Resource Type
-
0005000000000000
- More
- Availability
-
50
- Author / Contributor
- Filter by Author / Creator
-
-
Foster, D. (3)
-
Kakade, S. (3)
-
Foster, D (2)
-
Bai, Y (1)
-
Cordingly, R. (1)
-
Daskalakis, C (1)
-
Golowich, N (1)
-
Golowich, N. (1)
-
Hatchett, R. (1)
-
Hoang, V. (1)
-
Jiang, N (1)
-
Lloyd, W. (1)
-
Perez, D. (1)
-
Qian, J (1)
-
Rakhlin, A (1)
-
Sadeghi, Z. (1)
-
Xie, T (1)
-
Yu, H. (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
- Filter by Editor
-
-
null (2)
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Foster, D.; Golowich, N.; Kakade, S. (, Proceedings of the International Conference on Machine Learning)
-
Foster, D.; Kakade, S.; Qian, J; Rakhlin, A (, arXiv preprint)
-
Daskalakis, C; Foster, D; Golowich, N (, 34th Annual Conference on Neural Information Processing Systems (NeurIPS), NeurIPS 2020)null (Ed.)We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents (i.e., zero-sum stochastic games). We consider an episodic setting where in each episode, each player independently selects a policy and observes only their own actions and rewards, along with the state. We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule (which is necessary). To the best of our knowledge, this constitutes the first finite-sample convergence result for independent policy gradient methods in competitive RL; prior work has largely focused on centralized, coordinated procedures for equilibrium computation.more » « less
-
Cordingly, R.; Yu, H.; Hoang, V.; Perez, D.; Foster, D.; Sadeghi, Z.; Hatchett, R.; Lloyd, W. (, 2020 6th IEEE International Conference on Cloud and Big Data Computing (CBDCOM 2020))null (Ed.)