skip to main content

Title: Optimizing Sensor Allocation Against Attackers With Uncertain Intentions: A Worst-Case Regret Minimization Approach
This letter focuses on the optimal allocation of multi-stage attacks with the uncertainty in attacker’s intention. We model the attack planning problem using a Markov decision process and characterize the uncertainty in the attacker’s intention using a finite set of reward functions—each reward represents a type of attacker. Based on this modeling, we employ the paradigm of the worst-case absolute regret minimization from robust game theory and develop mixed-integer linear program (MILP) formulations for solving the worst-case regret minimizing sensor allocation strategies for two classes of attack-defend interactions: one where the defender and attacker engage in a zero-sum game and another where they engage in a non-zero-sum game. We demonstrate the effectiveness of our algorithm using a stochastic gridworld example.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE Control Systems Letters
Date Published:
Journal Name:
IEEE Control Systems Letters
Page Range / eLocation ID:
2863 to 2868
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. arXiv:2402.05300v2 (Ed.)
    This paper considers a multi-player resource-sharing game with a fair reward allocation model. Multiple players choose from a collection of resources. Each resource brings a random reward equally divided among the players who choose it. We consider two settings. The first setting is a one-slot game where the mean rewards of the resources are known to all the players, and the objective of player 1 is to maximize their worst-case expected utility. Certain special cases of this setting have explicit solutions. These cases provide interesting yet non-intuitive insights into the problem. The second setting is an online setting, where the game is played over a finite time horizon, where the mean rewards are unknown to the first player. Instead, the first player receives, as feedback, the rewards of the resources they chose after the action. We develop a novel Upper Confidence Bound (UCB) algorithm that minimizes the worst-case regret of the first player using the feedback received. 
    more » « less
  2. Deception is a crucial tool in the cyberdefence repertoire, enabling defenders to leverage their informational advantage to reduce the likelihood of successful attacks. One way deception can be employed is through obscuring, or masking, some of the information about how systems are configured, increasing attacker’s uncertainty about their tar-gets. We present a novel game-theoretic model of the resulting defender- attacker interaction, where the defender chooses a subset of attributes to mask, while the attacker responds by choosing an exploit to execute. The strategies of both players have combinatorial structure with complex informational dependencies, and therefore even representing these strategies is not trivial. First, we show that the problem of computing an equilibrium of the resulting zero-sum defender-attacker game can be represented as a linear program with a combinatorial number of system configuration variables and constraints, and develop a constraint generation approach for solving this problem. Next, we present a novel highly scalable approach for approximately solving such games by representing the strategies of both players as neural networks. The key idea is to represent the defender’s mixed strategy using a deep neural network generator, and then using alternating gradient-descent-ascent algorithm, analogous to the training of Generative Adversarial Networks. Our experiments, as well as a case study, demonstrate the efficacy of the proposed approach. 
    more » « less
  3. We characterize offline data poisoning attacks on Multi-Agent Reinforcement Learning (MARL), where an attacker may change a data set in an attempt to install a (potentially fictitious) unique Markov-perfect Nash equilibrium for a two-player zero-sum Markov game. We propose the unique Nash set, namely the set of games, specified by their Q functions, with a specific joint policy being the unique Nash equilibrium. The unique Nash set is central to poisoning attacks because the attack is successful if and only if data poisoning pushes all plausible games inside it. The unique Nash set generalizes the reward polytope commonly used in inverse reinforcement learning to MARL. For zero-sum Markov games, both the inverse Nash set and the set of plausible games induced by data are polytopes in the Q function space. We exhibit a linear program to efficiently compute the optimal poisoning attack. Our work sheds light on the structure of data poisoning attacks on offline MARL, a necessary step before one can design more robust MARL algorithms.

    more » « less
  4. Smart grid attacks can be applied on a single component or multiple components. The corresponding defense strategies are totally different. In this paper, we investigate the solutions (e.g., linear programming and reinforcement learning) for one-shot game between the attacker and defender in smart power systems. We designed one-shot game with multi-line- switching attack and solved it using linear programming. We also designed the game with single-line-switching attack and solved it using reinforcement learning. The pay-off and utility/reward of the game is calculated based on the generation loss due to initiated attack by the attacker. Defender's defense action is considered while evaluating the pay-off from attacker's and defender's action. The linear programming based solution gives the probability of choosing best attack actions against different defense actions. The reinforcement learning based solution gives the optimal action to take under selected defense action. The proposed game is demonstrated on 6 bus system and IEEE 30 bus system and optimal solutions are analyzed. 
    more » « less
  5. In offline multi-agent reinforcement learning (MARL), agents estimate policies from a given dataset. We study reward-poisoning attacks in this setting where an exogenous attacker modifies the rewards in the dataset before the agents see the dataset. The attacker wants to guide each agent into a nefarious target policy while minimizing the Lp norm of the reward modification. Unlike attacks on single-agent RL, we show that the attacker can install the target policy as a Markov Perfect Dominant Strategy Equilibrium (MPDSE), which rational agents are guaranteed to follow. This attack can be significantly cheaper than separate single-agent attacks. We show that the attack works on various MARL agents including uncertainty-aware learners, and we exhibit linear programs to efficiently solve the attack problem. We also study the relationship between the structure of the datasets and the minimal attack cost. Our work paves the way for studying defense in offline MARL. 
    more » « less