Multi-player games with lexicographic cost functions can capture a variety of driving and racing scenarios and are known to have pure-strategy Nash Equilibria (NE) under certain conditions. The standard Iterated Best Response (IBR) procedure for finding such equilibria can be slow because computing the best response for each agent generally involves solving a non-convex optimization problem. In this paper, we introduce a type of game which uses a lexicographic cost function. We show that for this class of games, the best responses can be effectively computed through piece-wise linear approximations. This enables us to approximate the NE using a linearized version of IBR. We show the gap between the linear approximations returned by our linearized IBR and the true best response drops asymptotically. We implement the algorithm and show that it can find approximate NE for a handful of agents driving in realistic scenarios in under 10 seconds.
more »
« less
Offline Congestion Games: How Feedback Type Affects Data Coverage Requirement
This paper investigates when one can efficiently recover an approximate Nash Equilibrium (NE) in offline congestion games. The existing dataset coverage assumption in offline general-sum games inevitably incurs a dependency on the number of actions, which can be exponentially large in congestion games. We consider three different types of feedback with decreasing revealed information. Starting from the facility-level (a.k.a., semi-bandit) feedback, we propose a novel one-unit deviation coverage condition and show a pessimism-type algorithm that can recover an approximate NE. For the agent-level (a.k.a., bandit) feedback setting, interestingly, we show the one-unit deviation coverage condition is not sufficient. On the other hand, we convert the game to multi-agent linear bandits and show that with a generalized data coverage assumption in offline linear bandits, we can efficiently recover the approximate NE. Lastly, we consider a novel type of feedback, the game-level feedback where only the total reward from all agents is revealed. Again, we show the coverage assumption for the agent-level feedback setting is insufficient in the game-level feedback setting, and with a stronger version of the data coverage assumption for linear bandits, we can recover an approximate NE. Together, our results constitute the first study of offline congestion games and imply formal separations between different types of feedback.
more »
« less
- PAR ID:
- 10443274
- Date Published:
- Journal Name:
- International Conference on Learning Representations
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In this paper, we investigate Nash-regret minimization in congestion games, a class of games with benign theoretical structure and broad real-world applications. We first propose a centralized algorithm based on the optimism in the face of uncertainty principle for congestion games with (semi-)bandit feedback, and obtain finite-sample guarantees. Then we propose a decentralized algorithm via a novel combination of the Frank-Wolfe method and G-optimal design. By exploiting the structure of the congestion game, we show the sample complexity of both algorithms depends only polynomially on the number of players and the number of facilities, but not the size of the action set, which can be exponentially large in terms of the number of facilities. We further define a new problem class, Markov congestion games, which allows us to model the non-stationarity in congestion games. We propose a centralized algorithm for Markov congestion games, whose sample complexity again has only polynomial dependence on all relevant problem parameters, but not the size of the action set.more » « less
-
We study how representation learning can im- prove the learning efficiency of contextual bandit problems. We study the setting where we play T contextual linear bandits with dimension d si- multaneously, and these T bandit tasks collec- tively share a common linear representation with a dimensionality of r ≪ d. We present a new algorithm based on alternating projected gradi- ent descent (GD) and minimization estimator to recover a low-rank feature matrix. Using the pro- posed estimator, we present a multi-task learning algorithm for linear contextual bandits and prove the regret bound of our algorithm. We presented experiments and compared the performance of our algorithm against benchmark algorithmsmore » « less
-
ABSTRACT We investigate an infinite‐horizon time‐inconsistent mean‐field game (MFG) in a discrete time setting. We first present a classic equilibrium for the MFG and its associated existence result. This classic equilibrium aligns with the conventional equilibrium concept studied in MFG literature when the context is time‐consistent. Then we demonstrate that while this equilibrium produces an approximate optimal strategy when applied to the related ‐agent games, it does so solely in a precommitment sense. Therefore, it cannot function as a genuinely approximate equilibrium strategy from the perspective of a sophisticated agent within the ‐agent game. To address this limitation, we propose a newconsistentequilibrium concept in both the MFG and the ‐agent game. We show that a consistent equilibrium in the MFG can indeed function as an approximate consistent equilibrium in the ‐agent game. Additionally, we analyze the convergence of consistent equilibria for ‐agent games toward a consistent MFG equilibrium as tends to infinity.more » « less
-
Experience management (EM) agents in multiplayer serious games face unique challenges and responsibilities regarding the fair treatment of players. One such challenge is the Greedy Bandit Problem that arises when using traditional Multi-Armed Bandits (MABs) as EM agents, which results in some players routinely prioritized while others may be ignored. We will show that this problem can be a cause of player non-adherence in a multiplayer serious game played by human users. To mitigate this effect, we propose a new bandit strategy, the Shapley Bandit, which enforces fairness constraints in its treatment of players based on the Shapley Value. We evaluate our approach via simulation with virtual players, finding that the Shapley Bandit can be effective in providing more uniform treatment of players while incurring only a slight cost in overall performance to a typical greedy approach. Our findings highlight the importance of fair treatment among players as a goal of multiplayer EM agents and discuss how addressing this issue may lead to more effective agent operation overall. The study contributes to the understanding of player modeling and EM in serious games and provides a promising approach for balancing fairness and engagement in multiplayer environments.more » « less
An official website of the United States government

