Linear-Quadratic Mean-Field Reinforcement Learning: Convergence of Policy Gradient Methods

Carmona, Rene; Lauriere, Mathieu; Tan, Zongjun

Citation Details

We investigate reinforcement learning for mean field control problems in discrete time, which can be viewed as Markov decision processes for a large number of exchangeable agents interacting in a mean field manner. Such problems arise, for instance when a large number of robots communicate through a central unit dispatching the optimal policy computed by minimizing the overall social cost. An approximate solution is obtained by learning the optimal policy of a generic agent interacting with the statistical distribution of the states of the other agents. We prove rigorously the convergence of exact and model-free policy gradient methods in a mean-field linear-quadratic setting. We also provide graphical evidence of the convergence based on implementations of our algorithms. more »

Award ID(s):: 1716673

PAR ID:: 10169110

Author(s) / Creator(s):: Carmona, Rene; Lauriere, Mathieu; Tan, Zongjun

Date Published:: 2019-10-01

Journal Name:: ArXivorg

ISSN:: 2331-8422

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
The DOI is not currently available.

More Like this