skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: No-Regret Learning in Dynamic Stackelberg Games
In a Stackelberg game, a leader commits to a randomized strategy and a follower chooses their best strategy in response. We consider an extension of a standard Stackelberg game, called a discrete-time dynamic Stackelberg game, that has an underlying state space that affects the leader’s rewards and available strategies and evolves in a Markovian manner depending on both the leader and follower’s selected trategies. Although standard Stackelberg games have been utilized to improve scheduling in security domains, their deployment is often limited by requiring complete information of the follower’s utility function. In contrast, we consider scenarios where the follower’s utility function is unknown to the leader; however, it can be linearly parameterized. Our objective is then to provide an algorithm that prescribes a randomized strategy to the leader at each step of the game based on observations of how the follower responded in previous steps. We design an online learning algorithm that, with high probability, is no-regret, i.e., achieves a regret bound (when compared to the best policy in hindsight), which is sublinear in the number of time steps; the degree of sublinearity depends on the number of features representing the follower’s utility function. The regret of the proposed learning algorithm is independent of the size of the state space and polynomial in the rest of the parameters of the game. We show that the proposed learning algorithm outperforms existing model-free reinforcement learning approaches.  more » « less
Award ID(s):
1652113
PAR ID:
10508410
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
IEEE TRANSACTIONS ON AUTOMATIC CONTROL
Date Published:
Journal Name:
IEEE Transactions on Automatic Control
Volume:
69
Issue:
3
ISSN:
0018-9286
Page Range / eLocation ID:
1418 - 1431
Subject(s) / Keyword(s):
Iterative learning control, no-regret learning, online learning, Stackelberg games, stochastic games
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We study the problem of online learning in a two-player decentralized cooperative Stackelberg game. In each round, the leader first takes an action, followed by the follower who takes their action after observing the leader’s move. The goal of the leader is to learn to minimize the cumulative regret based on the history of interactions. Differing from the traditional formulation of repeated Stackelberg games, we assume the follower is omniscient, with full knowledge of the true reward, and that they always best-respond to the leader’s actions. We analyze the sample complexity of regret minimization in this repeated Stackelberg game. We show that depending on the reward structure, the existence of the omniscient follower may change the sample complexity drastically, from constant to exponential, even for linear cooperative Stackelberg games. 
    more » « less
  2. When interacting with other non-competitive decision-making agents, it is critical for an autonomous agent to have inferable behavior: Their actions must convey their intention and strategy. For example, an autonomous car's strategy must be inferable by the pedestrians interacting with the car. We model the inferability problem using a repeated bimatrix Stackelberg game with observations where a leader and a follower repeatedly interact. During the interactions, the leader uses a fixed, potentially mixed strategy. The follower, on the other hand, does not know the leader's strategy and dynamically reacts based on observations that are the leader's previous actions. In the setting with observations, the leader may suffer from an inferability loss, i.e., the performance compared to the setting where the follower has perfect information of the leader's strategy. We show that the inferability loss is upper-bounded by a function of the number of interactions and the stochasticity level of the leader's strategy, encouraging the use of inferable strategies with lower stochasticity levels. As a converse result, we also provide a game where the required number of interactions is lower bounded by a function of the desired inferability loss. 
    more » « less
  3. We initiate the study of equilibrium refinements based on trembling-hand perfection in extensive-form games with commitment strategies, that is, where one player commits to a strategy first. We show that the standard strong (and weak) Stackelberg equilibria are not suitable for trembling-hand perfection, because the limit of a sequence of such strong (weak) Stackelberg commitment strategies of a perturbed game may not be a strong (weak) Stackelberg equilibrium itself. However, we show that the universal set of all Stackelberg equilibria (i.e., those that are optimal for at least some follower response function) is natural for trembling- hand perfection: it does not suffer from the problem above. We also prove that determining the existence of a Stackelberg equilibrium--refined or not--that gives the leader expected value at least v is NP-hard. This significantly extends prior complexity results that were specific to strong Stackelberg equilibrium. 
    more » « less
  4. In this paper, a distributed swarm control problem is studied for large-scale multi-agent systems (LS-MASs). Different than classical multi-agent systems, an LS-MAS brings new challenges to control design due to its large number of agents. It might be more difficult for developing the appropriate control to achieve complicated missions such as collective swarming. To address these challenges, a novel mixed game theory is developed with a hierarchical learning algorithm. In the mixed game, the LS-MAS is represented as a multi-group, large-scale leader–follower system. Then, a cooperative game is used to formulate the distributed swarm control for multi-group leaders, and a Stackelberg game is utilized to couple the leaders and their large-scale followers effectively. Using the interaction between leaders and followers, the mean field game is used to continue the collective swarm behavior from leaders to followers smoothly without raising the computational complexity or communication traffic. Moreover, a hierarchical learning algorithm is designed to learn the intelligent optimal distributed swarm control for multi-group leader–follower systems. Specifically, a multi-agent actor–critic algorithm is developed for obtaining the distributed optimal swarm control for multi-group leaders first. Furthermore, an actor–critic–mass method is designed to find the decentralized swarm control for large-scale followers. Eventually, a series of numerical simulations and a Lyapunov stability proof of the closed-loop system are conducted to demonstrate the performance of the developed scheme. 
    more » « less
  5. In this work, we consider an LTI system with a Kalman filter, detector, and Linear Quadratic Gaussian (LQG) controller under false data injection attack. The interaction between the controller and adversary is captured by a Stackelberg game, in which the controller is the leader and the adversary is the follower. We propose a framework under which the system chooses time-varying detection thresholds to reduce the effectiveness of the attack and enhance the control performance. We model the impact of the detector as a switching signal, resulting in a switched linear system. A closed form solution for the optimal attack is first computed using the proposed framework, as the best response to any detection threshold. We then present a convex program to compute the optimal detection threshold. Our approach is evaluated using a numerical case study. 
    more » « less