Nonzero sum games typically have multiple Nash equilibriums (or no equilibrium), and unlike the zero-sum case, they may have different values at different equilibriums. Instead of focusing on the existence of individual equilibriums, we study the set of values over all equilibriums, which we call the set value of the game. The set value is unique by nature and always exists (with possible value [Formula: see text]). Similar to the standard value function in control literature, it enjoys many nice properties, such as regularity, stability, and more importantly, the dynamic programming principle. There are two main features in order to obtain the dynamic programming principle: (i) we must use closed-loop controls (instead of open-loop controls); and (ii) we must allow for path dependent controls, even if the problem is in a state-dependent (Markovian) setting. We shall consider both discrete and continuous time models with finite time horizon. For the latter, we will also provide a duality approach through certain standard PDE (or path-dependent PDE), which is quite efficient for numerically computing the set value of the game.
more »
« less
Linear-quadratic zero-sum mean-field type games: Optimality conditions and policy optimization
In this paper, zero-sum mean-field type games (ZSMFTG) with linear dynamics and quadratic cost are studied under infinite-horizon discounted utility function. ZSMFTG are a class of games in which two decision makers whose utilities sum to zero, compete to influence a large population of indistinguishable agents. In particular, the case in which the transition and utility functions depend on the state, the action of the controllers, and the mean of the state and the actions, is investigated. The optimality conditions of the game are analysed for both open-loop and closed-loop controls, and explicit expressions for the Nash equilibrium strategies are derived. Moreover, two policy optimization methods that rely on policy gradient are proposed for both model-based and sample-based frameworks. In the model-based case, the gradients are computed exactly using the model, whereas they are estimated using Monte-Carlo simulations in the sample-based case. Numerical experiments are conducted to show the convergence of the utility function as well as the two players' controls.
more »
« less
- Award ID(s):
- 1716673
- PAR ID:
- 10299691
- Date Published:
- Journal Name:
- Journal of Dynamics & Games
- Volume:
- 8
- Issue:
- 4
- ISSN:
- 2164-6066
- Page Range / eLocation ID:
- 403
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper is concerned with two-person mean-field linear-quadratic non-zero sum stochastic differential games in an infinite horizon. Both open-loop and closed-loop Nash equilibria are introduced. The existence of an open-loop Nash equilibrium is characterized by the solvability of a system of mean-field forward-backward stochastic differential equations in an infinite horizon and the convexity of the cost functionals, and the closed-loop representation of an open-loop Nash equilibrium is given through the solution to a system of two coupled non-symmetric algebraic Riccati equations. The existence of a closed-loop Nash equilibrium is characterized by the solvability of a system of two coupled symmetric algebraic Riccati equations. Two-person mean-field linear-quadratic zero-sum stochastic differential games in an infinite horizon are also considered. Both the existence of open-loop and closed-loop saddle points are characterized by the solvability of a system of two coupled generalized algebraic Riccati equations with static stabilizing solutions. Mean-field linear-quadratic stochastic optimal control problems in an infinite horizon are discussed as well, for which it is proved that the open-loop solvability and closed-loop solvability are equivalent.more » « less
-
Multi-agent reinforcement learning (MARL) lies at the heart of a plethora of applications involving the interaction of a group of agents in a shared unknown environment. A prominent framework for studying MARL is Markov games, with the goal of finding various notions of equilibria in a sample-efficient manner, such as the Nash equilibrium (NE) and the coarse correlated equilibrium (CCE). However, existing sample-efficient approaches either require tailored uncertainty estimation under function approximation, or careful coordination of the players. In this paper, we propose a novel model-based algorithm, called VMG, that incentivizes exploration via biasing the empirical estimate of the model parameters towards those with a higher collective best-response values of all the players when fixing the other players’ policies, thus encouraging the policy to deviate from its current equilibrium for more exploration. VMG is oblivious to different forms of function approximation, and permits simultaneous and uncoupled policy updates of all players. Theoretically, we also establish that VMG achieves a near-optimal regret for finding both the NEs of two-player zero-sum Markov games and CCEs of multi-player general-sum Markov games under linear function approximation in an online environment, which nearly match their counterparts with sophisticated uncertainty quantification.more » « less
-
Closed-loop stability of uncertain linear systems is studied under the state feedback realized by a linear quadratic regulator (LQR). Sufficient conditions are presented that ensure the closed-loop stability in the presence of uncertainty, initially for the case of a non-robust LQR designed for a nominal model not reflecting the system uncertainty. Since these conditions are usually violated for a large uncertainty, a procedure is offered to redesign such a non-robust LQR into a robust one that ensures closed-loop stability under a predefined level of uncertainty. The analysis of this paper largely relies on the concept of inverse optimal control to construct suitable performance measures for uncertain linear systems, which are non-quadratic in structure but yield optimal controls in the form of LQR. The relationship between robust LQR and zero-sum linear quadratic dynamic games is established.more » « less
-
Summary This article presents a novel actor‐critic‐barrier structure for the multiplayer safety‐critical systems. Non‐zero‐sum (NZS) games with full‐state constraints are first transformed into unconstrained NZS games using a barrier function. The barrier function is capable of dealing with both symmetric and asymmetric constraints on the state. It is shown that the Nash equilibrium of the unconstrained NZS guarantees to stabilize the original multiplayer system. The barrier function is combined with an actor‐critic structure to learn the Nash equilibrium solution in an online fashion. It is shown that integrating the barrier function with the actor‐critic structure guarantees that the constraints will not be violated during learning. Boundedness and stability of the closed‐loop signals are analyzed. The efficacy of the presented approach is finally demonstrated by using a simulation example.more » « less
An official website of the United States government

