Smart grid attacks can be applied on a single component or multiple components. The corresponding defense strategies are totally different. In this paper, we investigate the solutions (e.g., linear programming and reinforcement learning) for one-shot game between the attacker and defender in smart power systems. We designed one-shot game with multi-line- switching attack and solved it using linear programming. We also designed the game with single-line-switching attack and solved it using reinforcement learning. The pay-off and utility/reward of the game is calculated based on the generation loss due to initiated attack by the attacker. Defender's defense action is considered while evaluating the pay-off from attacker's and defender's action. The linear programming based solution gives the probability of choosing best attack actions against different defense actions. The reinforcement learning based solution gives the optimal action to take under selected defense action. The proposed game is demonstrated on 6 bus system and IEEE 30 bus system and optimal solutions are analyzed.
more »
« less
Heterogeneous reinforcement learning for defending power grids against attacks
Reinforcement learning (RL) has been employed to devise the best course of actions in defending the critical infrastructures, such as power networks against cyberattacks. Nonetheless, even in the case of the smallest power grids, the action space of RL experiences exponential growth, rendering efficient exploration by the RL agent practically unattainable. The current RL algorithms tailored to power grids are generally not suited when the state-action space size becomes large, despite trade-offs. We address the large action-space problem for power grid security by exploiting temporal graph convolutional neural networks (TGCNs) to develop a parallel but heterogeneous RL framework. In particular, we divide the action space into smaller subspaces, each explored by an RL agent. How to efficiently organize the spatiotemporal action sequences then becomes a great challenge. We invoke TGCN to meet this challenge by accurately predicting the performance of each individual RL agent in the event of an attack. The top performing agent is selected, resulting in the optimal sequence of actions. First, we investigate the action-space size comparison for IEEE 5-bus and 14-bus systems. Furthermore, we use IEEE 14-bus and IEEE 118-bus systems coupled with the Grid2Op platform to illustrate the performance and action division influence on training times and grid survival rates using both deep Q-learning and Soft Actor Critic trained agents and Grid2Op default greedy agents. Our TGCN framework provides a computationally reasonable approach for generating the best course of actions to defend cyber physical systems against attacks.
more »
« less
- Award ID(s):
- 2048288
- PAR ID:
- 10572726
- Publisher / Repository:
- APL
- Date Published:
- Journal Name:
- APL Machine Learning
- Volume:
- 2
- Issue:
- 2
- ISSN:
- 2770-9019
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Vaccaro, Alfredo (Ed.)A modern challenge in power engineering is to perform the dynamic security assessment (DSA) of grids that are 100% powered by inverter-based resources (IBRs). Addressing this challenge is difficult because: (i) the dispatch of IBRs can be uncertain as a result of the variability of renewable resources and (ii) they have hard current control limits that cannot be neglected, contrasting synchronous machines. To address this problem, this paper sets forth a framework to conduct DSA of bulk power systems that are 100% powered by grid-forming IBRs. The framework considers that IBR operational conditions are unknown but bounded by a zonotope which is also expressed as a polynomial vector for uncertainty propagation via Dormand–Prince integration. The framework is applied to modified versions of the WSCC 9-bus and IEEE 39-bus grids.more » « less
-
null (Ed.)In the modern power system networks, grid observability has greatly increased due to the deployment of various metering technologies. Such technologies enhanced the real-time monitoring of the grid. The collection of observations are processed by the state estimator in which many applications have relied on. Traditionally, state estimation on power grids has been done considering a centralized architecture. With grid deregulation, and awareness of information privacy and security, much attention has been given to multi-area state estimation. Considering such, state-of-the-art solutions consider a weighted norm of residual measurement model, which might hinder masked gross errors contained in the null-space of the Jacobian matrix. Towards the solution of this, a distributed innovation-based model is presented. Measurement innovation is used towards error composition. The measurement error is an independent random variable, where the residual is not. Thus, the masked component is recovered through measurement innovation. Model solution is obtained through an Alternating Direction Method of Multipliers (ADMM), which requires minimal information communication. The presented framework is validated using the IEEE 14 and IEEE 118 bus systems. Easy-to-implement model, build-on the classical weighted norm of the residual solution, and without hard-to-design parameters highlight potential aspects towards real-life implementation.more » « less
-
We study reinforcement learning (RL) in a setting with a network of agents whose states and actions interact in a local manner where the objective is to find localized policies such that the (discounted) global reward is maximized. A fundamental challenge in this setting is that the state-action space size scales exponentially in the number of agents, rendering the problem intractable for large networks. In this paper, we propose a scalable actor critic (SAC) framework that exploits the network structure and finds a localized policy that is an [Formula: see text]-approximation of a stationary point of the objective for some [Formula: see text], with complexity that scales with the local state-action space size of the largest [Formula: see text]-hop neighborhood of the network. We illustrate our model and approach using examples from wireless communication, epidemics, and traffic.more » « less
-
This paper describes how domain knowledge of power system operators can be integrated into reinforcement learning (RL) frameworks to effectively learn agents that control the grid's topology to prevent thermal cascading. Typical RL-based topology controllers fail to perform well due to the large search/optimization space. Here, we propose an actor-critic-based agent to address the problem's combinatorial nature and train the agent using the RL environment developed by RTE, the French TSO. To address the challenge of the large optimization space, a curriculum-based approach with reward tuning is incorporated into the training procedure by modifying the environment using network physics for enhanced agent learning. Further, a parallel training approach on multiple scenarios is employed to avoid biasing the agent to a few scenarios and make it robust to the natural variability in grid operations. Without these modifications to the training procedure, the RL agent failed for most test scenarios, illustrating the importance of properly integrating domain knowledge of physical systems for real-world RL learning. The agent was tested by RTE for the 2019 learning to run the power network challenge and was awarded the 2nd place in accuracy and 1st place in speed. The developed code is open-sourced for public use. Analysis of a simple system proves the enhancement in training RL-agents using the curriculum.more » « less
An official website of the United States government

