We study the problem of online learning in a two-player decentralized cooperative Stackelberg game. In each round, the leader first takes an action, followed by the follower who takes their action after observing the leader’s move. The goal of the leader is to learn to minimize the cumulative regret based on the history of interactions. Differing from the traditional formulation of repeated Stackelberg games, we assume the follower is omniscient, with full knowledge of the true reward, and that they always best-respond to the leader’s actions. We analyze the sample complexity of regret minimization in this repeated Stackelberg game. We show that depending on the reward structure, the existence of the omniscient follower may change the sample complexity drastically, from constant to exponential, even for linear cooperative Stackelberg games.
more »
« less
Learning Strategy-Aware Linear Classifiers
We address the question of repeatedly learning linear classifiers against agents who are \emph{strategically} trying to \emph{game} the deployed classifiers, and we use the \emph{Stackelberg regret} to measure the performance of our algorithms. First, we show that Stackelberg and external regret for the problem of strategic classification are \emph{strongly incompatible}: i.e., there exist worst-case scenarios, where \emph{any} sequence of actions providing \emph{sublinear} external regret might result in \emph{linear} Stackelberg regret and vice versa. Second, we present a strategy-aware algorithm for minimizing the Stackelberg regret for which we prove nearly matching upper and lower regret bounds. Finally, we provide simulations to complement our theoretical analysis. Our results advance the growing literature of learning from revealed preferences, which has so far focused on smoother'' assumptions from the perspective of the learner and the agents respectively.
more »
« less
- Award ID(s):
- 2007951
- PAR ID:
- 10282453
- Date Published:
- Journal Name:
- Advances in Neural Information Processing Systems
- Volume:
- 33
- Page Range / eLocation ID:
- 15265--15276
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In a Stackelberg game, a leader commits to a randomized strategy and a follower chooses their best strategy in response. We consider an extension of a standard Stackelberg game, called a discrete-time dynamic Stackelberg game, that has an underlying state space that affects the leader’s rewards and available strategies and evolves in a Markovian manner depending on both the leader and follower’s selected trategies. Although standard Stackelberg games have been utilized to improve scheduling in security domains, their deployment is often limited by requiring complete information of the follower’s utility function. In contrast, we consider scenarios where the follower’s utility function is unknown to the leader; however, it can be linearly parameterized. Our objective is then to provide an algorithm that prescribes a randomized strategy to the leader at each step of the game based on observations of how the follower responded in previous steps. We design an online learning algorithm that, with high probability, is no-regret, i.e., achieves a regret bound (when compared to the best policy in hindsight), which is sublinear in the number of time steps; the degree of sublinearity depends on the number of features representing the follower’s utility function. The regret of the proposed learning algorithm is independent of the size of the state space and polynomial in the rest of the parameters of the game. We show that the proposed learning algorithm outperforms existing model-free reinforcement learning approaches.more » « less
-
We study the problem of online binary classification where strategic agents can manipulate their observable features in predefined ways, modeled by a manipulation graph, in order to receive a positive classification. We show this setting differs in fundamental ways from classic (non-strategic) online classification. For instance, whereas in the non-strategic case, a mistake bound of ln |H| is achievable via the halving algorithm when the target function belongs to a known class H, we show that no deterministic algorithm can achieve a mistake bound o(Δ) in the strategic setting, where Δ is the maximum degree of the manipulation graph (even when |H| = O(Δ)). We complement this with a general algorithm achieving mistake bound O(Δ ln |H|). We also extend this to the agnostic setting, and show that this algorithm achieves a Δ multiplicative regret (mistake bound of O(Δ · OPT + Δ · ln |H|)), and that no deterministic algorithm can achieve o(Δ) multiplicative regret. Next, we study two randomized models based on whether the random choices are made before or after agents respond, and show they exhibit fundamental differences. In the first, fractional model, at each round the learner deterministically chooses a probability distribution over classifiers inducing expected values on each vertex (probabilities of being classified as positive), which the strategic agents respond to. We show that any learner in this model has to suffer linear regret. On the other hand, in the second randomized algorithms model, while the adversary who selects the next agent must respond to the learner's probability distribution over classifiers, the agent then responds to the actual hypothesis classifier drawn from this distribution. Surprisingly, we show this model is more advantageous to the learner, and we design randomized algorithms that achieve sublinear regret bounds against both oblivious and adaptive adversaries.more » « less
-
The Multiplicative Weights Update method (MWU) is a simple yet powerful algorithm for learning linear classifiers, for ensemble learning a la boosting, for approximately solving linear and semidefinite systems, for computing approximate solutions to multicommodity flow problems, and for online convex optimization, among other applications. In this brief announcement, we apply techniques from interactive theorem proving to define and prove correct the first formally verified implementation of MWU (specifically, we show that our MWU is no regret). Our primary application -- and one justification of the relevance of our work to the PODC community -- is to verified multi-agent systems, such as distributed multi-agent network flow and load balancing games, for which verified MWU provides a convenient method for distributed computation of approximate Coarse Correlated Equilibria.more » « less
-
Representation learning is a powerful tool that enables learning over large multitudes of agents or domains by enforcing that all agents operate on a shared set of learned features. However, many robotics or controls applications that would benefit from collaboration operate in settings with changing environments and goals, whereas most guarantees for representation learning are stated for static settings. Toward rigorously establishing the benefit of representation learning in dynamic settings, we analyze the regret of multi-task representation learning for linear-quadratic control. This setting introduces unique challenges. Firstly, we must account for and balance the misspecification introduced by an approximate representation. Secondly, we cannot rely on the parameter update schemes of single-task online LQR, for which least-squares often suffices, and must devise a novel scheme to ensure sufficient improvement. We demonstrate that for settings where exploration is benign, the regret of any agent after T timesteps scales with the square root of T/H, where H is the number of agents. In settings with difficult exploration, the regret scales as the square root of the input dimension times the parameter dimension multiplied by T, plus a term which scales with T to the three quarters divided by H to the one fifth. In both cases, by comparing to the minimax single-task regret, we see a benefit of a large number of agents. Notably, in the difficult exploration case, by sharing a representation across tasks, the effective task-specific parameter count can often be small. Lastly, we validate the trends we predict.more » « less
An official website of the United States government

