skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Model-Free Mean-Field Reinforcement Learning: Mean-Field MDP and Mean-Field Q-Learning
We develop a general reinforcement learning framework for mean field control (MFC) problems. Such problems arise for instance as the limit of collaborative multi-agent control problems when the number of agents is very large. The asymptotic problem can be phrased as the optimal control of a non-linear dynamics. This can also be viewed as a Markov decision process (MDP) but the key difference with the usual RL setup is that the dynamics and the reward now depend on the state's probability distribution itself. Alternatively, it can be recast as a MDP on the Wasserstein space of measures. In this work, we introduce generic model-free algorithms based on the state-action value function at the mean field level and we prove convergence for a prototypical Q-learning method. We then implement an actor-critic method and report numerical results on two archetypal problems: a finite space model motivated by a cyber security application and a continuous space model motivated by an application to swarm motion.  more » « less
Award ID(s):
1716673
PAR ID:
10169099
Author(s) / Creator(s):
;
Date Published:
Journal Name:
ArXivorg
ISSN:
2331-8422
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider the problem of representing collective behavior of large popula- tions and predicting the evolution of a population distribution over a discrete state space. A discrete time mean field game (MFG) is motivated as an interpretable model founded on game theory for understanding the aggregate effect of individ- ual actions and predicting the temporal evolution of population distributions. We achieve a synthesis of MFG and Markov decision processes (MDP) by showing that a special MFG is reducible to an MDP. This enables us to broaden the scope of mean field game theory and infer MFG models of large real-world systems via deep inverse reinforcement learning. Our method learns both the reward function and forward dynamics of an MFG from real data, and we report the first empirical test of a mean field game model of a real-world social media population. 
    more » « less
  2. We investigate reinforcement learning for mean field control problems in discrete time, which can be viewed as Markov decision processes for a large number of exchangeable agents interacting in a mean field manner. Such problems arise, for instance when a large number of robots communicate through a central unit dispatching the optimal policy computed by minimizing the overall social cost. An approximate solution is obtained by learning the optimal policy of a generic agent interacting with the statistical distribution of the states of the other agents. We prove rigorously the convergence of exact and model-free policy gradient methods in a mean-field linear-quadratic setting. We also provide graphical evidence of the convergence based on implementations of our algorithms. 
    more » « less
  3. This paper proposes a scalable learning framework to solve a system of coupled forward–backward partial differential equations (PDEs) arising from mean field games (MFGs). The MFG system incorporates a forward PDE to model the propagation of population dynamics and a backward PDE for a representative agent’s optimal control. Existing work mainly focus on solving the mean field game equilibrium (MFE) of the MFG system when given fixed boundary conditions, including the initial population state and terminal cost. To obtain MFE efficiently, particularly when the initial population density and terminal cost vary, we utilize a physics-informed neural operator (PINO) to tackle the forward–backward PDEs. A learning algorithm is devised and its performance is evaluated on one application domain, which is the autonomous driving velocity control. Numerical experiments show that our method can obtain the MFE accurately when given different initial distributions of vehicles. The PINO exhibits both memory efficiency and generalization capabilities compared to physics-informed neural networks (PINNs). 
    more » « less
  4. In this paper, we study the maximum principle of mean field type control problems when the volatility function depends on the state and its measure and also the control, by using our recently developed method in [Bensoussan, A., Huang, Z. and Yam, S. C. P. [2023] Control theory on Wasserstein space: A new approach to optimality conditions, Ann. Math. Sci. Appl.; Bensoussan, A., Tai, H. M. and Yam, S. C. P. [2023] Mean field type control problems, some Hilbert-space-valued FBSDEs, and related equations, preprint (2023), arXiv:2305.04019; Bensoussan, A. and Yam, S. C. P. [2019] Control problem on space of random variables and master equation, ESAIM Control Optim. Calc. Var. 25, 10]. Our method is to embed the mean field type control problem into a Hilbert space to bypass the evolution in the Wasserstein space. We here give a necessary condition and a sufficient condition for these control problems in Hilbert spaces, and we also derive a system of forward–backward stochastic differential equations. 
    more » « less
  5. In this paper, we study the maximum principle of mean field type control problems when the volatility function depends on the state and its measure and also the control, by using our recently developed method in [Bensoussan, A., Huang, Z. and Yam, S. C. P. [2023] Control theory on Wasserstein space: A new approach to optimality conditions, Ann. Math. Sci. Appl.; Bensoussan, A., Tai, H. M. and Yam, S. C. P. [2023] Mean field type control problems, some Hilbert-space-valued FBSDEs, and related equations, preprint (2023), arXiv:2305.04019; Bensoussan, A. and Yam, S. C. P. [2019] Control problem on space of random variables and master equation, ESAIM Control Optim. Calc. Var. 25, 10]. Our method is to embed the mean field type control problem into a Hilbert space to bypass the evolution in the Wasserstein space. We here give a necessary condition and a sufficient condition for these control problems in Hilbert spaces, and we also derive a system of forward–backward stochastic differential equations. 
    more » « less