skip to main content


Title: Multi-agent motion planning using differential games with lexicographic preferences
Multi-player games with lexicographic cost functions can capture a variety of driving and racing scenarios and under certain conditions are known to have pure-strategy Nash Equilibria. The standard Iterated Best Response (IBR) procedure for finding such equilibria can be slow because, in general, computing the best response for each agent involves solving a non-convex optimization problem. In this paper, we introduce a type of game which uses a lexicographic cost function. We show that for this class of games, the best responses can be effectively computed through piece-wise linear approximations. This in turn enables us to approximate the Nash Equilibria using a linearized version of IBR. We show that the gap between the linear approximations returned by our linearized IBR and the true best response drops asymptotically. We have implemented the algorithm and our experiments show that it can find approximate Nash Equilibria for handful of agents driving in realistic scenarios in less than 10 seconds.  more » « less
Award ID(s):
1918531
NSF-PAR ID:
10366105
Author(s) / Creator(s):
Date Published:
Journal Name:
IEEE Conference on Decision and Control
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Multi-player games with lexicographic cost functions can capture a variety of driving and racing scenarios and are known to have pure-strategy Nash Equilibria (NE) under certain conditions. The standard Iterated Best Response (IBR) procedure for finding such equilibria can be slow because computing the best response for each agent generally involves solving a non-convex optimization problem. In this paper, we introduce a type of game which uses a lexicographic cost function. We show that for this class of games, the best responses can be effectively computed through piece-wise linear approximations. This enables us to approximate the NE using a linearized version of IBR. We show the gap between the linear approximations returned by our linearized IBR and the true best response drops asymptotically. We implement the algorithm and show that it can find approximate NE for a handful of agents driving in realistic scenarios in under 10 seconds. 
    more » « less
  2. Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algo- rithm for two-player zero-sum games that has been empirically shown to find approximate Nash equilibria in large games. Although PSRO is guaranteed to converge to an approximate Nash equilibrium and can handle continuous actions, it may take an exponential number of iterations as the number of information states (infostates) grows. We propose Extensive-Form Double Oracle (XDO), an extensive-form double oracle algorithm for two-player zero-sum games that is guar- anteed to converge to an approximate Nash equilibrium linearly in the number of infostates. Unlike PSRO, which mixes best responses at the root of the game, XDO mixes best responses at every infostate. We also introduce Neural XDO (NXDO), where the best response is learned through deep RL. In tabular experiments on Leduc poker, we find that XDO achieves an approximate Nash equilibrium in a number of iterations an order of magnitude smaller than PSRO. Experiments on a modified Leduc poker game and Oshi-Zumo show that tabular XDO achieves a lower exploitability than CFR with the same amount of computation. We also find that NXDO outperforms PSRO and NFSP on a sequential multidimensional continuous-action game. NXDO is the first deep RL method that can find an approximate Nash equilibrium in high-dimensional continuous-action sequential games. Experiment code is available at https://github.com/indylab/nxdo. 
    more » « less
  3. Guruswami, Venkatesan (Ed.)
    A fundamental shortcoming of the concept of Nash equilibrium is its computational intractability: approximating Nash equilibria in normal-form games is PPAD-hard. In this paper, inspired by the ideas of smoothed analysis, we introduce a relaxed variant of Nash equilibrium called σ-smooth Nash equilibrium, for a {smoothness parameter} σ. In a σ-smooth Nash equilibrium, players only need to achieve utility at least as high as their best deviation to a σ-smooth strategy, which is a distribution that does not put too much mass (as parametrized by σ) on any fixed action. We distinguish two variants of σ-smooth Nash equilibria: strong σ-smooth Nash equilibria, in which players are required to play σ-smooth strategies under equilibrium play, and weak σ-smooth Nash equilibria, where there is no such requirement. We show that both weak and strong σ-smooth Nash equilibria have superior computational properties to Nash equilibria: when σ as well as an approximation parameter ϵ and the number of players are all constants, there is a {constant-time} randomized algorithm to find a weak ϵ-approximate σ-smooth Nash equilibrium in normal-form games. In the same parameter regime, there is a polynomial-time deterministic algorithm to find a strong ϵ-approximate σ-smooth Nash equilibrium in a normal-form game. These results stand in contrast to the optimal algorithm for computing ϵ-approximate Nash equilibria, which cannot run in faster than quasipolynomial-time, subject to complexity-theoretic assumptions. We complement our upper bounds by showing that when either σ or ϵ is an inverse polynomial, finding a weak ϵ-approximate σ-smooth Nash equilibria becomes computationally intractable. Our results are the first to propose a variant of Nash equilibrium which is computationally tractable, allows players to act independently, and which, as we discuss, is justified by an extensive line of work on individual choice behavior in the economics literature. 
    more » « less
  4. We report on the formalization in Ssreflect/Coq of a number of concepts and results from algorithmic game theory, including potential games, smooth games, solution concepts such as Pure and Mixed Nash Equilibria, Coarse Correlated Equilibria, epsilon-approximate equilibria, and behavioral models of games such as best-response dynamics. We apply the formalization to prove Price of Stability bounds for, and convergence under best-response dynamics of, the Atomic Routing game, which has applications in computer networking. Our second application proves that Affine Congestion games are (5/3, 1/3)-smooth, and therefore have Price of Anarchy 5/2. Our formalization is available online. 
    more » « less
  5. We explore the use of policy approximations to reduce the computational cost of learning Nash equilibria in zero-sum stochastic games. We propose a new Q-learning type algorithm that uses a sequence of entropy-regularized soft policies to approximate the Nash policy during the Q-function updates. We prove that under certain conditions, by updating the regularized Q-function, the algorithm converges to a Nash equilibrium. We also demonstrate the proposed algorithm’s ability to transfer previous training experiences, enabling the agents to adapt quickly to new environments. We provide a dynamic hyper-parameter scheduling scheme to further expedite convergence. Empirical results applied to a number of stochastic games verify that the proposed algorithm converges to the Nash equilibrium while exhibiting a major speed-up over existing algorithms. 
    more » « less