In a Stackelberg game, a leader commits to a randomized strategy and a follower chooses their best strategy in response. We consider an extension of a standard Stackelberg game, called a discrete-time dynamic Stackelberg game, that has an underlying state space that affects the leader’s rewards and available strategies and evolves in a Markovian manner depending on both the leader and follower’s selected trategies. Although standard Stackelberg games have been utilized to improve scheduling in security domains, their deployment is often limited by requiring complete information of the follower’s utility function. In contrast, we consider scenarios where the follower’s utility function is unknown to the leader; however, it can be linearly parameterized. Our objective is then to provide an algorithm that prescribes a randomized strategy to the leader at each step of the game based on observations of how the follower responded in previous steps. We design an online learning algorithm that, with high probability, is no-regret, i.e., achieves a regret bound (when compared to the best policy in hindsight), which is sublinear in the number of time steps; the degree of sublinearity depends on the number of features representing the follower’s utility function. The regret of the proposed learning algorithm is independent of the size of the state space and polynomial in the rest of the parameters of the game. We show that the proposed learning algorithm outperforms existing model-free reinforcement learning approaches. 
                        more » 
                        « less   
                    
                            
                            Encouraging Inferable Behavior for Autonomy: Repeated Bimatrix Stackelberg Games with Observations
                        
                    
    
            When interacting with other non-competitive decision-making agents, it is critical for an autonomous agent to have inferable behavior: Their actions must convey their intention and strategy. For example, an autonomous car's strategy must be inferable by the pedestrians interacting with the car. We model the inferability problem using a repeated bimatrix Stackelberg game with observations where a leader and a follower repeatedly interact. During the interactions, the leader uses a fixed, potentially mixed strategy. The follower, on the other hand, does not know the leader's strategy and dynamically reacts based on observations that are the leader's previous actions. In the setting with observations, the leader may suffer from an inferability loss, i.e., the performance compared to the setting where the follower has perfect information of the leader's strategy. We show that the inferability loss is upper-bounded by a function of the number of interactions and the stochasticity level of the leader's strategy, encouraging the use of inferable strategies with lower stochasticity levels. As a converse result, we also provide a game where the required number of interactions is lower bounded by a function of the desired inferability loss. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10585017
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- Proceedings of the American Control Conference
- ISSN:
- 2378-5861
- ISBN:
- 979-8-3503-8265-5
- Page Range / eLocation ID:
- 360 to 366
- Format(s):
- Medium: X
- Location:
- Toronto, ON, Canada
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            We study the problem of online learning in a two-player decentralized cooperative Stackelberg game. In each round, the leader first takes an action, followed by the follower who takes their action after observing the leader’s move. The goal of the leader is to learn to minimize the cumulative regret based on the history of interactions. Differing from the traditional formulation of repeated Stackelberg games, we assume the follower is omniscient, with full knowledge of the true reward, and that they always best-respond to the leader’s actions. We analyze the sample complexity of regret minimization in this repeated Stackelberg game. We show that depending on the reward structure, the existence of the omniscient follower may change the sample complexity drastically, from constant to exponential, even for linear cooperative Stackelberg games.more » « less
- 
            null (Ed.)Traditionally, in the bilevel optimization framework, a leader chooses her actions by solving an upper-level problem, assuming that a follower chooses an optimal reaction by solving a lower-level problem. However, in many settings, the lower-level problems might be nontrivial, thus requiring the use of tailored algorithms for their solution. More importantly, in practice, such problems might be inexactly solved by heuristics and approximation algorithms. Motivated by this consideration, we study a broad class of bilevel optimization problems where the follower might not optimally react to the leader’s actions. In particular, we present a modeling framework in which the leader considers that the follower might use one of a number of known algorithms to solve the lower-level problem, either approximately or heuristically. Thus, the leader can hedge against the follower’s use of suboptimal solutions. We provide algorithmic implementations of the framework for a class of nonlinear bilevel knapsack problem (BKP), and we illustrate the potential impact of incorporating this realistic feature through numerical experiments in the context of defender-attacker problems.more » « less
- 
            Addressing the urgent global challenge of man-made greenhouse gas emissions and climate change necessitates collaborative action between shipping lines and government regulatory agencies. Aligning with the International Maritime Organization’s emissions reduction strategy, this paper presents a novel bi-level programming model that unifies these stakeholders. On the upper level of the proposed bi-level model, a number of shipping lines optimize retrofitting plans for their vessels to maximize economic benefits. On the lower level, the regulatory agency responds to the carbon reduction efforts by setting retrofitting subsidies and emission penalty rates. This framework represents a multi-leader–single-follower game involving shipping lines and the regulatory agency, and its equilibrium is determined through an equilibrium problem with equilibrium constraints (EPEC). The EPEC comprises multiple single-leader–follower problems, each of which can be formulated as a mathematical program with equilibrium constraints (MPEC). The diagonalization algorithm (DM) is employed for its solution. Simulation studies performed based on a ten-year planning period show that the proposed approach can effectively promote vessel retrofitting and the use of green fuels, which leads to an annual emission reduction of over 50%.more » « less
- 
            Learning Environment Constraints in Collaborative Robotics: A Decentralized Leader-Follower ApproachIn this paper, we propose a leader-follower hierarchical strategy for two robots collaboratively transporting an object in a partially known environment with obstacles. Both robots sense the local surrounding environment and react to obstacles in their proximity. We consider no explicit communication, so the local environment information and the control actions are not shared between the robots. At any given time step, the leader solves a model predictive control (MPC) problem with its known set of obstacles and plans a feasible trajectory to complete the task. The follower estimates the inputs of the leader and uses a policy to assist the leader while reacting to obstacles in its proximity. The leader infers obstacles in the follower’s vicinity by using the difference between the predicted and the real-time estimated follower control action. A method to switch the leader-follower roles is used to improve the control performance in tight environments. The efficacy of our approach is demonstrated with detailed comparisons to two alternative strategies, where it achieves the highest success rate, while completing the task fastest.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    