skip to main content


Title: Multi-agent motion planning using differential games with lexicographic preferences
Multi-player games with lexicographic cost functions can capture a variety of driving and racing scenarios and are known to have pure-strategy Nash Equilibria (NE) under certain conditions. The standard Iterated Best Response (IBR) procedure for finding such equilibria can be slow because computing the best response for each agent generally involves solving a non-convex optimization problem. In this paper, we introduce a type of game which uses a lexicographic cost function. We show that for this class of games, the best responses can be effectively computed through piece-wise linear approximations. This enables us to approximate the NE using a linearized version of IBR. We show the gap between the linear approximations returned by our linearized IBR and the true best response drops asymptotically. We implement the algorithm and show that it can find approximate NE for a handful of agents driving in realistic scenarios in under 10 seconds.  more » « less
Award ID(s):
1918531
NSF-PAR ID:
10488133
Author(s) / Creator(s):
;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
2022 IEEE 61st Conference on Decision and Control (CDC)
Page Range / eLocation ID:
5751 to 5756
Format(s):
Medium: X
Location:
Cancun, Mexico
Sponsoring Org:
National Science Foundation
More Like this
  1. Multi-player games with lexicographic cost functions can capture a variety of driving and racing scenarios and under certain conditions are known to have pure-strategy Nash Equilibria. The standard Iterated Best Response (IBR) procedure for finding such equilibria can be slow because, in general, computing the best response for each agent involves solving a non-convex optimization problem. In this paper, we introduce a type of game which uses a lexicographic cost function. We show that for this class of games, the best responses can be effectively computed through piece-wise linear approximations. This in turn enables us to approximate the Nash Equilibria using a linearized version of IBR. We show that the gap between the linear approximations returned by our linearized IBR and the true best response drops asymptotically. We have implemented the algorithm and our experiments show that it can find approximate Nash Equilibria for handful of agents driving in realistic scenarios in less than 10 seconds. 
    more » « less
  2. Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algo- rithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to an approximate Nash equilibrium and can handle continuous actions, it may take an exponential number of iterations as the number of information states (infostates) grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm for two-player zero-sum games that is guar- anteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an approximate Nash equilibrium in a number of iterations an order of magnitude smaller than PSRO. Experiments on a modified Leduc poker game and Oshi-Zumo show that tabular XDO achieves a lower exploitability than CFR with the same amount of computation. We also find that NXDO outperforms PSRO and NFSP on a sequential multidimensional continuous-action game. NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games. Experiment code is available at https://github.com/indylab/nxdo. 
    more » « less
  3. We explore the use of policy approximations to reduce the computational cost of learning Nash equilibria in zero-sum stochastic games. We propose a new Q-learning type algorithm that uses a sequence of entropy-regularized soft policies to approximate the Nash policy during the Q-function updates. We prove that under certain conditions, by updating the regularized Q-function, the algorithm converges to a Nash equilibrium. We also demonstrate the proposed algorithm’s ability to transfer previous training experiences, enabling the agents to adapt quickly to new environments. We provide a dynamic hyper-parameter scheduling scheme to further expedite convergence. Empirical results applied to a number of stochastic games verify that the proposed algorithm converges to the Nash equilibrium while exhibiting a major speed-up over existing algorithms. 
    more » « less
  4. We report on the formalization in Ssreflect/Coq of a number of concepts and results from algorithmic game theory, including potential games, smooth games, solution concepts such as Pure and Mixed Nash Equilibria, Coarse Correlated Equilibria, epsilon-approximate equilibria, and behavioral models of games such as best-response dynamics. We apply the formalization to prove Price of Stability bounds for, and convergence under best-response dynamics of, the Atomic Routing game, which has applications in computer networking. Our second application proves that Affine Congestion games are (5/3, 1/3)-smooth, and therefore have Price of Anarchy 5/2. Our formalization is available online. 
    more » « less
  5. This paper investigates when one can efficiently recover an approximate Nash Equilibrium (NE) in offline congestion games. The existing dataset coverage assumption in offline general-sum games inevitably incurs a dependency on the number of actions, which can be exponentially large in congestion games. We consider three different types of feedback with decreasing revealed information. Starting from the facility-level (a.k.a., semi-bandit) feedback, we propose a novel one-unit deviation coverage condition and show a pessimism-type algorithm that can recover an approximate NE. For the agent-level (a.k.a., bandit) feedback setting, interestingly, we show the one-unit deviation coverage condition is not sufficient. On the other hand, we convert the game to multi-agent linear bandits and show that with a generalized data coverage assumption in offline linear bandits, we can efficiently recover the approximate NE. Lastly, we consider a novel type of feedback, the game-level feedback where only the total reward from all agents is revealed. Again, we show the coverage assumption for the agent-level feedback setting is insufficient in the game-level feedback setting, and with a stronger version of the data coverage assumption for linear bandits, we can recover an approximate NE. Together, our results constitute the first study of offline congestion games and imply formal separations between different types of feedback. 
    more » « less