Model-Free Mean-Field Reinforcement Learning: Mean-Field MDP and Mean-Field Q-Learning

Carmona, Rene; Lauriere, Mathieu

Citation Details

We develop a general reinforcement learning framework for mean field control (MFC) problems. Such problems arise for instance as the limit of collaborative multi-agent control problems when the number of agents is very large. The asymptotic problem can be phrased as the optimal control of a non-linear dynamics. This can also be viewed as a Markov decision process (MDP) but the key difference with the usual RL setup is that the dynamics and the reward now depend on the state's probability distribution itself. Alternatively, it can be recast as a MDP on the Wasserstein space of measures. In this work, we introduce generic model-free algorithms based on the state-action value function at the mean field level and we prove convergence for a prototypical Q-learning method. We then implement an actor-critic method and report numerical results on two archetypal problems: a finite space model motivated by a cyber security application and a continuous space model motivated by an application to swarm motion. more »

Award ID(s):: 1716673

PAR ID:: 10169099

Author(s) / Creator(s):: Carmona, Rene; Lauriere, Mathieu

Date Published:: 2019-10-01

Journal Name:: ArXivorg

ISSN:: 2331-8422

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
The DOI is not currently available.

More Like this