Security of cyber-physical systems (CPS) continues to pose new challenges due to the tight integration and operational complexity of the cyber and physical components. To address these challenges, this article presents a domain-aware, optimization-based approach to determine an effective defense strategy for CPS in an automated fashion—by emulating a strategic adversary in the loop that exploits system vulnerabilities, interconnection of the CPS, and the dynamics of the physical components. Our approach builds on an adversarial decision-making model based on a Markov Decision Process (MDP) that determines the optimal cyber (discrete) and physical (continuous) attack actions over a CPS attack graph. The defense planning problem is modeled as a non-zero-sum game between the adversary and defender. We use a model-free reinforcement learning method to solve the adversary’s problem as a function of the defense strategy. We then employ Bayesian optimization (BO) to find an approximatebest-responsefor the defender to harden the network against the resulting adversary policy. This process is iterated multiple times to improve the strategy for both players. We demonstrate the effectiveness of our approach on a ransomware-inspired graph with a smart building system as the physical process. Numerical studies show that our method converges to a Nash equilibrium for various defender-specific costs of network hardening.
more »
« less
Adversarial Deep Reinforcement Learning Based Adaptive Moving Target Defense
Moving target defense (MTD) is a proactive defense approach that aims to thwart attacks by continuously changing the attack surface of a system (e.g., changing host or network configurations), thereby increasing the adversary’s uncertainty and attack cost. To maximize the impact of MTD, a defender must strategically choose when and what changes to make, taking into account both the characteristics of its system as well as the adversary’s observed activities. Finding an optimal strategy for MTD presents a significant challenge, especially when facing a resourceful and determined adversary who may respond to the defender’s actions. In this paper, we propose a multi-agent partially-observable Markov Decision Process model of MTD and formulate a two-player general-sum game between the adversary and the defender. To solve this game, we propose a multi-agent reinforcement learning framework based on the double oracle algorithm. Finally, we provide experimental results to demonstrate the effectiveness of our framework in finding optimal policies.
more »
« less
- Award ID(s):
- 1905558
- PAR ID:
- 10213633
- Date Published:
- Journal Name:
- International Conference on Decision and Game Theory for Security
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Smart grid attacks can be applied on a single component or multiple components. The corresponding defense strategies are totally different. In this paper, we investigate the solutions (e.g., linear programming and reinforcement learning) for one-shot game between the attacker and defender in smart power systems. We designed one-shot game with multi-line- switching attack and solved it using linear programming. We also designed the game with single-line-switching attack and solved it using reinforcement learning. The pay-off and utility/reward of the game is calculated based on the generation loss due to initiated attack by the attacker. Defender's defense action is considered while evaluating the pay-off from attacker's and defender's action. The linear programming based solution gives the probability of choosing best attack actions against different defense actions. The reinforcement learning based solution gives the optimal action to take under selected defense action. The proposed game is demonstrated on 6 bus system and IEEE 30 bus system and optimal solutions are analyzed.more » « less
-
This work proposes a moving target defense (MTD) strategy to detect coordinated cyber-physical attacks (CCPAs) against power grids. A CCPA consists of a physical attack, such as disconnecting a transmission line, followed by a coordinated cyber attack that injects false data into the sensor measurements to mask the effects of the physical attack. Such attacks can lead to undetectable line outages and cause significant damage to the grid. The main idea of the proposed approach is to invalidate the knowledge that the attackers use to mask the effects of the physical attack by actively perturbing the grid’s transmission line reactances using distributed flexible AC transmission system (D-FACTS) devices. We identify the MTD design criteria in this context to thwart CCPAs. The proposed MTD design consists of two parts. First, we identify the subset of links for D-FACTS device deployment that enables the defender to detect CCPAs against any link in the system. Then, in order to minimize the defense cost during the system’s operational time, we use a game-theoretic approach to identify the best subset of links (within the D-FACTS deployment set) to perturb which will provide adequate protection. Extensive simulations performed using the MATPOWER simulator on IEEE bus systems verify the effectiveness of our approach in detecting CCPAs and reducing the operator’s defense cost.more » « less
-
Most of the cybersecurity research focus on either presenting a specific vulnerability %or hacking technique, or proposing a specific defense algorithm to defend against a well-defined attack scheme. Although such cybersecurity research is important, few have paid attention to the dynamic interactions between attackers and defenders, where both sides are intelligent and will dynamically change their attack or defense strategies in order to gain the upper hand over their opponents. This 'cyberwar' phenomenon exists among most cybersecurity incidents in the real world, which warrants special research and analysis. In this paper, we propose a dynamic game theoretic framework (i.e., hyper defense) to analyze the interactions between the attacker and the defender as a non-cooperative security game. The key idea is to model attackers/defenders to have multiple levels of attack/defense strategies that are different in terms of effectiveness, strategy costs, and attack gains/damages. Each player adjusts his strategy based on the strategy's cost, potential attack gain/damage, and effectiveness in anticipating of the opponent's strategy. We study the achievable Nash equilibrium for the attacker-defender security game where the players employ an efficient strategy according to the obtained equilibrium. Furthermore, we present case studies of three different types of network attacks and put forth how our hyper defense system can successfully model them. Simulation results show that the proposed game theoretical system achieves a better performance compared to two other fixed-strategy defense systems.more » « less
-
Detection of malicious behavior is a fundamental problem in security. One of the major challenges in using detection systems in practice is in dealing with an overwhelming number of alerts that are triggered by normal behavior (the so-called false positives), obscuring alerts resulting from actual malicious activity. While numerous methods for reducing the scope of this issue have been proposed, ultimately one must still decide how to prioritize which alerts to investigate, and most existing prioritization methods are heuristic, for example, based on suspiciousness or priority scores. We introduce a novel approach for computing a policy for prioritizing alerts using adversarial reinforcement learning. Our approach assumes that the attackers know the full state of the detection system and dynamically choose an optimal attack as a function of this state, as well as of the alert prioritization policy. The first step of our approach is to capture the interaction between the defender and attacker in a game theoretic model. To tackle the computational complexity of solving this game to obtain a dynamic stochastic alert prioritization policy, we propose an adversarial reinforcement learning framework. In this framework, we use neural reinforcement learning to compute best response policies for both the defender and the adversary to an arbitrary stochastic policy of the other. We then use these in a double-oracle framework to obtain an approximate equilibrium of the game, which in turn yields a robust stochastic policy for the defender. Extensive experiments using case studies in fraud and intrusion detection demonstrate that our approach is effective in creating robust alert prioritization policies.more » « less
An official website of the United States government

