skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Safe reinforcement learning for dynamical games
Summary This article presents a novel actor‐critic‐barrier structure for the multiplayer safety‐critical systems. Non‐zero‐sum (NZS) games with full‐state constraints are first transformed into unconstrained NZS games using a barrier function. The barrier function is capable of dealing with both symmetric and asymmetric constraints on the state. It is shown that the Nash equilibrium of the unconstrained NZS guarantees to stabilize the original multiplayer system. The barrier function is combined with an actor‐critic structure to learn the Nash equilibrium solution in an online fashion. It is shown that integrating the barrier function with the actor‐critic structure guarantees that the constraints will not be violated during learning. Boundedness and stability of the closed‐loop signals are analyzed. The efficacy of the presented approach is finally demonstrated by using a simulation example.  more » « less
Award ID(s):
1851588
PAR ID:
10457927
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
International Journal of Robust and Nonlinear Control
Volume:
30
Issue:
9
ISSN:
1049-8923
Page Range / eLocation ID:
p. 3706-3726
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. An atomic routing game is a multiplayer game on a directed graph. Each player in the game chooses a path—a sequence of links that connect its origin node to its destination node—with the lowest cost, where the cost of each link is a function of all players’ choices. We develop a novel numerical method to design the link cost function in atomic routing games such that the players’ choices at the Nash equilibrium minimize a given smooth performance function. This method first approximates the nonsmooth Nash equilibrium conditions with smooth ones, then iteratively improves the link cost function via implicit differentiation. We demonstrate the application of this method to atomic routing games that model noncooperative agents navigating in grid worlds. 
    more » « less
  2. In this paper, we develop distributed computation algorithms for Nash equilibriums of linear quadratic network games with proven differential privacy guarantees. In a network game with each player's payoff being a quadratic function, the dependencies of the decisions in the payoff function naturally encode a network structure governing the players' inter-personal influences. Such social influence structure and the individual marginal payoffs of the players indicate economic spillovers and individual preferences, and thus they are subject to privacy concerns. For distributed computing of the Nash equilibrium, the players are interconnected by a public communication graph, over which dynamical states are shared among neighboring nodes. When the players' marginal payoffs are considered to be private knowledge, we propose a distributed randomized gradient descent algorithm, in which each player adds a Laplacian random noise to her marginal payoff in the recursive updates. It is proven that the algorithm can guarantee differential privacy and convergence in expectation to the Nash equilibrium of the network game at each player's state. Moreover, the mean-square error between the players' states and the Nash equilibrium is shown to be bounded by a constant related to the differential privacy level. Next, when both the players' marginal payoffs and the influence graph are private information, we propose two distributed algorithms by randomized communication and randomized projection, respectively, for privacy preservation. The differential privacy and convergence guarantees are also established for such algorithms. 
    more » « less
  3. This paper considers the control problem with constraints on full-state and control input simultaneously. First, a novel barrier function based system transformation approach is developed to guarantee the full-state constraints. To deal with the input saturation, the hyperbolic-type penalty function is imposed on the control input. The actor-critic based reinforcement learning technique is combined with the barrier transformation to learn the optimal control policy that considers both the full-state constraints and input saturations. To illustrate the efficacy, a numeric simulation is implemented in the end. 
    more » « less
  4. Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e.g. when training GANs. In this paper, we analyse a new extra-gradient method for Nash equilibrium finding, that performs gradient extrapolations and updates on a random subset of players at each iteration. This approach provably exhibits a better rate of convergence than full extra-gradient for non-smooth convex games with noisy gradient oracle. We propose an additional variance reduction mechanism to obtain speed-ups in smooth convex games. Our approach makes extrapolation amenable to massive multiplayer settings, and brings empirical speed-ups, in particular when using a heuristic cyclic sampling scheme. Most importantly, it allows to train faster and better GANs and mixtures of GANs. 
    more » « less
  5. In multi-agent dynamic games, the Nash equilibrium state trajectory of each agent is determined by its cost function and the information pattern of the game. However, the cost and trajectory of each agent may be unavailable to the other agents. Prior work on using partial observations to infer the costs in dynamic games assumes an open-loop information pattern. In this work, we demonstrate that the feedback Nash equilibrium concept is more expressive and encodes more complex behavior. It is desirable to develop specific tools for inferring players’ objectives in feedback games. Therefore, we consider the dynamic game cost inference problem under the feedback information pattern, using only partial state observations and incomplete trajectory data. To this end, we first propose an inverse feedback game loss function, whose minimizer yields a feedback Nash equilibrium state trajectory closest to the observa- tion data. We characterize the landscape and differentiability of the loss function. Given the difficulty of obtaining the exact gradient, our main contribution is an efficient gradient approximator, which enables a novel inverse feedback game solver that minimizes the loss using first-order optimization. In thorough empirical evaluations, we demonstrate that our algorithm converges reliably and has better robustness and generalization performance than the open-loop baseline method when the observation data reflects a group of players acting in a feedback Nash game. 
    more » « less