skip to main content


Title: Optimizing Sensor Allocation Against Attackers With Uncertain Intentions: A Worst-Case Regret Minimization Approach
This letter focuses on the optimal allocation of multi-stage attacks with the uncertainty in attacker’s intention. We model the attack planning problem using a Markov decision process and characterize the uncertainty in the attacker’s intention using a finite set of reward functions—each reward represents a type of attacker. Based on this modeling, we employ the paradigm of the worst-case absolute regret minimization from robust game theory and develop mixed-integer linear program (MILP) formulations for solving the worst-case regret minimizing sensor allocation strategies for two classes of attack-defend interactions: one where the defender and attacker engage in a zero-sum game and another where they engage in a non-zero-sum game. We demonstrate the effectiveness of our algorithm using a stochastic gridworld example.  more » « less
Award ID(s):
2144113
NSF-PAR ID:
10496980
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE Control Systems Letters
Date Published:
Journal Name:
IEEE Control Systems Letters
Volume:
7
ISSN:
2475-1456
Page Range / eLocation ID:
2863 to 2868
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Deception is a crucial tool in the cyberdefence repertoire, enabling defenders to leverage their informational advantage to reduce the likelihood of successful attacks. One way deception can be employed is through obscuring, or masking, some of the information about how systems are configured, increasing attacker’s uncertainty about their tar-gets. We present a novel game-theoretic model of the resulting defender- attacker interaction, where the defender chooses a subset of attributes to mask, while the attacker responds by choosing an exploit to execute. The strategies of both players have combinatorial structure with complex informational dependencies, and therefore even representing these strategies is not trivial. First, we show that the problem of computing an equilibrium of the resulting zero-sum defender-attacker game can be represented as a linear program with a combinatorial number of system configuration variables and constraints, and develop a constraint generation approach for solving this problem. Next, we present a novel highly scalable approach for approximately solving such games by representing the strategies of both players as neural networks. The key idea is to represent the defender’s mixed strategy using a deep neural network generator, and then using alternating gradient-descent-ascent algorithm, analogous to the training of Generative Adversarial Networks. Our experiments, as well as a case study, demonstrate the efficacy of the proposed approach. 
    more » « less
  2. arXiv:2402.05300v2 (Ed.)
    This paper considers a multi-player resource-sharing game with a fair reward allocation model. Multiple players choose from a collection of resources. Each resource brings a random reward equally divided among the players who choose it. We consider two settings. The first setting is a one-slot game where the mean rewards of the resources are known to all the players, and the objective of player 1 is to maximize their worst-case expected utility. Certain special cases of this setting have explicit solutions. These cases provide interesting yet non-intuitive insights into the problem. The second setting is an online setting, where the game is played over a finite time horizon, where the mean rewards are unknown to the first player. Instead, the first player receives, as feedback, the rewards of the resources they chose after the action. We develop a novel Upper Confidence Bound (UCB) algorithm that minimizes the worst-case regret of the first player using the feedback received. 
    more » « less
  3. We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games with simultaneous moves. To incorporate function approximation, we consider a family of Markov games where the reward function and transition kernel possess a linear structure. Both the offline and online settings of the problems are considered. In the offline setting, we control both players and aim to find the Nash equilibrium by minimizing the duality gap. In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret. For both settings, we propose an optimistic variant of the least-squares minimax value iteration algorithm. We show that our algorithm is computationally efficient and provably achieves an [Formula: see text] upper bound on the duality gap and regret, where d is the linear dimension, H the horizon and T the total number of timesteps. Our results do not require additional assumptions on the sampling model. Our setting requires overcoming several new challenges that are absent in Markov decision processes or turn-based Markov games. In particular, to achieve optimism with simultaneous moves, we construct both upper and lower confidence bounds of the value function, and then compute the optimistic policy by solving a general-sum matrix game with these bounds as the payoff matrices. As finding the Nash equilibrium of a general-sum game is computationally hard, our algorithm instead solves for a coarse correlated equilibrium (CCE), which can be obtained efficiently. To our best knowledge, such a CCE-based scheme for optimism has not appeared in the literature and might be of interest in its own right. 
    more » « less
  4. Considerable work has focused on optimal stopping problems where random IID offers arrive sequentially for a single available resource which is controlled by the decision-maker. After viewing the realization of the offer, the decision-maker irrevocably rejects it, or accepts it, collecting the reward and ending the game. We consider an important extension of this model to a dynamic setting where the resource is "renewable'' (a rental, a work assignment, or a temporary position) and can be allocated again after a delay period d. In the case where the reward distribution is known a priori, we design an (asymptotically optimal) 1/2-competitive Prophet Inequality, namely, a policy that collects in expectation at least half of the expected reward collected by a prophet who a priori knows all the realizations. This policy has a particularly simple characterization as a thresholding rule which depends on the reward distribution and the blocking period d, and arises naturally from an LP-relaxation of the prophet's optimal solution. Moreover, it gives the key for extending to the case of unknown distributions; here, we construct a dynamic threshold rule using the reward samples collected when the resource is not blocked. We provide a regret guarantee for our algorithm against the best policy in hindsight, and prove a complementing minimax lower bound on the best achievable regret, establishing that our policy achieves, up to poly-logarithmic factors, the best possible regret in this setting. 
    more » « less
  5. Cyber-physical systems (CPS) have been increasingly attacked by hackers. CPS are especially vulnerable to attackers that have full knowledge of the system's configuration. Therefore, novel anomaly detection algorithms in the presence of a knowledgeable adversary need to be developed. However, this research is still in its infancy due to limited attack data availability and test beds. By proposing a holistic attack modeling framework, we aim to show the vulnerability of existing detection algorithms and provide a basis for novel sensor-based cyber-attack detection. Stealthy Attack GEneration (SAGE) for CPS serves as a tool for cyber-risk assessment of existing systems and detection algorithms for practitioners and researchers alike. Stealthy attacks are characterized by malicious injections into the CPS through input, output, or both, which produce bounded changes in the detection residue. By using the SAGE framework, we generate stealthy attacks to achieve three objectives: (i) Maximize damage, (ii) Avoid detection, and (iii) Minimize the attack cost. Additionally, an attacker needs to adhere to the physical principles in a CPS (objective iv). The goal of SAGE is to model worst-case attacks, where we assume limited information asymmetries between attackers and defenders (e.g., insider knowledge of the attacker). Those worst-case attacks are the hardest to detect, but common in practice and allow understanding of the maximum conceivable damage. We propose an efficient solution procedure for the novel SAGE optimization problem. The SAGE framework is illustrated in three case studies. Those case studies serve as modeling guidelines for the development of novel attack detection algorithms and comprehensive cyber-physical risk assessment of CPS. The results show that SAGE attacks can cause severe damage to a CPS, while only changing the input control signals minimally. This avoids detection and keeps the cost of an attack low. This highlights the need for more advanced detection algorithms and novel research in cyber-physical security. 
    more » « less