skip to main content


Search for: All records

Award ID contains: 1716673

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. null (Ed.)
    We develop a probabilistic approach to continuous-time finite state mean field games. Based on an alternative description of continuous-time Markov chains by means of semimartingales and the weak formulation of stochastic optimal control, our approach not only allows us to tackle the mean field of states and the mean field of control at the same time, but also extends the strategy set of players from Markov strategies to closed-loop strategies. We show the existence and uniqueness of Nash equilibrium for the mean field game as well as how the equilibrium of a mean field game consists of an approximative Nash equilibrium for the game with a finite number of players under different assumptions of structure and regularity on the cost functions and transition rate between states. 
    more » « less
  2. null (Ed.)

    In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynamics and quadratic cost are studied under infinite-horizon discounted utility function. ZSMFTG are a class of games in which two decision makers whose utilities sum to zero, compete to influence a large population of indistinguishable agents. In particular, the case in which the transition and utility functions depend on the state, the action of the controllers, and the mean of the state and the actions, is investigated. The optimality conditions of the game are analysed for both open-loop and closed-loop controls, and explicit expressions for the Nash equilibrium strategies are derived. Moreover, two policy optimization methods that rely on policy gradient are proposed for both model-based and sample-based frameworks. In the model-based case, the gradients are computed exactly using the model, whereas they are estimated using Monte-Carlo simulations in the sample-based case. Numerical experiments are conducted to show the convergence of the utility function as well as the two players' controls.

     
    more » « less
  3. We develop a general reinforcement learning framework for mean field control (MFC) problems. Such problems arise for instance as the limit of collaborative multi-agent control problems when the number of agents is very large. The asymptotic problem can be phrased as the optimal control of a non-linear dynamics. This can also be viewed as a Markov decision process (MDP) but the key difference with the usual RL setup is that the dynamics and the reward now depend on the state's probability distribution itself. Alternatively, it can be recast as a MDP on the Wasserstein space of measures. In this work, we introduce generic model-free algorithms based on the state-action value function at the mean field level and we prove convergence for a prototypical Q-learning method. We then implement an actor-critic method and report numerical results on two archetypal problems: a finite space model motivated by a cyber security application and a continuous space model motivated by an application to swarm motion. 
    more » « less
  4. We introduce and investigate certain N player dynamic games on the line and in the plane that admit Coulomb gas dynamics as a Nash equilibrium. Most significantly, we find that the universal local limit of the equilibrium is sensitive to the chosen model of player information in one dimension but not in two dimensions. We also find that players can achieve game theoretic symmetry through selfish behavior despite non-exchangeability of states, which allows us to establish strong localized convergence of the N-Nash systems to the expected mean field equations against locally optimal player ensembles, i.e., those exhibiting the same local limit as the Nash-optimal ensemble. In one dimension, this convergence notably features a nonlocal-to-local transition in the population dependence of the N-Nash system. 
    more » « less
  5. We investigate reinforcement learning for mean field control problems in discrete time, which can be viewed as Markov decision processes for a large number of exchangeable agents interacting in a mean field manner. Such problems arise, for instance when a large number of robots communicate through a central unit dispatching the optimal policy computed by minimizing the overall social cost. An approximate solution is obtained by learning the optimal policy of a generic agent interacting with the statistical distribution of the states of the other agents. We prove rigorously the convergence of exact and model-free policy gradient methods in a mean-field linear-quadratic setting. We also provide graphical evidence of the convergence based on implementations of our algorithms. 
    more » « less
  6. A mean field game is proposed for the synchronization of oscillators facing conflicting objectives. Our motivation is to offer an alternative to recent attempts to use dynamical systems to illustrate some of the idiosyncrasies of jet lag recovery. Our analysis is driven by two goals: (1) to understand the long time behavior of the oscillators when an individual remains in the same time zone, and (2) to quantify the costs from jet lag recovery when the individual has traveled across time zones. Finite difference schemes are used to find numerical approximations to the mean field game solutions. They are benchmarked against explicit solutions derived for a special case. Numerical results are presented and conjectures are formulated. The numerics suggest that the cost the oscillators accrue while recovering is larger for eastward travel which is consistent with the widely admitted wisdom that jet lag is worse after traveling east than west. 
    more » « less
  7. This project investigates numerical methods for solving fully coupled forward-backward stochastic differential equations (FBSDEs) of McKean-Vlasov type. Having numerical solvers for such mean field FBSDEs is of interest because of the potential application of these equations to optimization problems over a large population, say for instance mean field games (MFG) and optimal mean field control problems . Theory for this kind of problems has met with great success since the early works on mean field games by Lasry and Lions, see [29], and by Huang, Caines, and Malhame, see [26]. Generally speaking, the purpose is to understand the continuum limit of optimizers or of equilibria (say in Nash sense) as the number of underlying players tends to infinity. When approached from the probabilistic viewpoint, solutions to these control problems (or games) can be described by coupled mean field FBSDEs, meaning that the coefficients depend upon the own marginal laws of the solution. In this note, we detail two methods for solving such FBSDEs which we implement and apply to five benchmark problems . The first method uses a tree structure to represent the pathwise laws of the solution, whereas the second method uses a grid discretization to represent the time marginal laws of the solutions. Both are based on a Picard scheme; importantly, we combine each of them with a generic continuation method that permits to extend the time horizon (or equivalently the coupling strength between the two equations) for which the Picard iteration converges. 
    more » « less
  8. The price of anarchy, originally introduced to quantify the inefficiency of selfish behavior in routing games, is extended to mean field games. The price of anarchy is defined as the ratio of a worst case social cost computed for a mean field game equilibrium to the optimal social cost as computed by a central planner. We illustrate properties of such a price of anarchy on linear quadratic extended mean field games, for which explicit computations are possible. A sufficient and necessary condition to have no price of anarchy is presented . Various asymptotic behaviors of the price of anarchy are proved for limiting behaviors of the coefficients in the model and numerics are presented . 
    more » « less