skip to main content

This content will become publicly available on August 1, 2024

Title: Online Learning in Stackelberg Games with an Omniscient Follower
We study the problem of online learning in a two-player decentralized cooperative Stackelberg game. In each round, the leader first takes an action, followed by the follower who takes their action after observing the leader’s move. The goal of the leader is to learn to minimize the cumulative regret based on the history of interactions. Differing from the traditional formulation of repeated Stackelberg games, we assume the follower is omniscient, with full knowledge of the true reward, and that they always best-respond to the leader’s actions. We analyze the sample complexity of regret minimization in this repeated Stackelberg game. We show that depending on the reward structure, the existence of the omniscient follower may change the sample complexity drastically, from constant to exponential, even for linear cooperative Stackelberg games.  more » « less
Award ID(s):
1909499 2211209 1901252
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Proceedings of Machine Learning Research
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In a Stackelberg game, a leader commits to a randomized strategy and a follower chooses their best strategy in response. We consider an extension of a standard Stackelberg game, called a discrete-time dynamic Stackelberg game, that has an underlying state space that affects the leader’s rewards and available strategies and evolves in a Markovian manner depending on both the leader and follower’s selected trategies. Although standard Stackelberg games have been utilized to improve scheduling in security domains, their deployment is often limited by requiring complete information of the follower’s utility function. In contrast, we consider scenarios where the follower’s utility function is unknown to the leader; however, it can be linearly parameterized. Our objective is then to provide an algorithm that prescribes a randomized strategy to the leader at each step of the game based on observations of how the follower responded in previous steps. We design an online learning algorithm that, with high probability, is no-regret, i.e., achieves a regret bound (when compared to the best policy in hindsight), which is sublinear in the number of time steps; the degree of sublinearity depends on the number of features representing the follower’s utility function. The regret of the proposed learning algorithm is independent of the size of the state space and polynomial in the rest of the parameters of the game. We show that the proposed learning algorithm outperforms existing model-free reinforcement learning approaches. 
    more » « less
  2. We initiate the study of equilibrium refinements based on trembling-hand perfection in extensive-form games with commitment strategies, that is, where one player commits to a strategy first. We show that the standard strong (and weak) Stackelberg equilibria are not suitable for trembling-hand perfection, because the limit of a sequence of such strong (weak) Stackelberg commitment strategies of a perturbed game may not be a strong (weak) Stackelberg equilibrium itself. However, we show that the universal set of all Stackelberg equilibria (i.e., those that are optimal for at least some follower response function) is natural for trembling- hand perfection: it does not suffer from the problem above. We also prove that determining the existence of a Stackelberg equilibrium--refined or not--that gives the leader expected value at least v is NP-hard. This significantly extends prior complexity results that were specific to strong Stackelberg equilibrium. 
    more » « less
  3. null (Ed.)
    Multi-robot cooperation requires agents to make decisions that are consistent with the shared goal without disregarding action-specific preferences that might arise from asymmetry in capabilities and individual objectives. To accomplish this goal, we propose a method named SLiCC: Stackelberg Learning in Cooperative Control. SLiCC models the problem as a partially observable stochastic game composed of Stackelberg bimatrix games, and uses deep reinforcement learning to obtain the payoff matrices associated with these games. Appropriate cooperative actions are then selected with the derived Stackelberg equilibria. Using a bi-robot cooperative object transportation problem, we validate the performance of SLiCC against centralized multi-agent Q-learning and demonstrate that SLiCC achieves better combined utility. 
    more » « less
  4. Effectively predicting intent and behavior requires inferring leadership in multi-agent interactions. Dynamic games provide an expressive theoretical framework for modeling these interactions. Employing this framework, we propose a novel method to infer the leader in a two-agent game by observing the agents’ behavior in complex, long-horizon interactions. We make two con- tributions. First, we introduce an iterative algorithm that solves dynamic two-agent Stackelberg games with nonlinear dynamics and nonquadratic costs, and demonstrate that it consistently converges in repeated trials. Second, we propose the Stackelberg Leadership Filter (SLF), an online method for identifying the leading agent in interactive scenarios based on observations of the game interactions. We validate the leadership filter’s efficacy on simulated driving scenarios to demonstrate that the SLF can draw conclusions about leadership that match right-of-way expectations. 
    more » « less
  5. In this paper, a distributed swarm control problem is studied for large-scale multi-agent systems (LS-MASs). Different than classical multi-agent systems, an LS-MAS brings new challenges to control design due to its large number of agents. It might be more difficult for developing the appropriate control to achieve complicated missions such as collective swarming. To address these challenges, a novel mixed game theory is developed with a hierarchical learning algorithm. In the mixed game, the LS-MAS is represented as a multi-group, large-scale leader–follower system. Then, a cooperative game is used to formulate the distributed swarm control for multi-group leaders, and a Stackelberg game is utilized to couple the leaders and their large-scale followers effectively. Using the interaction between leaders and followers, the mean field game is used to continue the collective swarm behavior from leaders to followers smoothly without raising the computational complexity or communication traffic. Moreover, a hierarchical learning algorithm is designed to learn the intelligent optimal distributed swarm control for multi-group leader–follower systems. Specifically, a multi-agent actor–critic algorithm is developed for obtaining the distributed optimal swarm control for multi-group leaders first. Furthermore, an actor–critic–mass method is designed to find the decentralized swarm control for large-scale followers. Eventually, a series of numerical simulations and a Lyapunov stability proof of the closed-loop system are conducted to demonstrate the performance of the developed scheme. 
    more » « less