skip to main content

Title: Curriculum Based Reinforcement Learning of Grid Topology Controllers to Prevent Thermal Cascading
This paper describes how domain knowledge of power system operators can be integrated into reinforcement learning (RL) frameworks to effectively learn agents that control the grid's topology to prevent thermal cascading. Typical RL-based topology controllers fail to perform well due to the large search/optimization space. Here, we propose an actor-critic-based agent to address the problem's combinatorial nature and train the agent using the RL environment developed by RTE, the French TSO. To address the challenge of the large optimization space, a curriculum-based approach with reward tuning is incorporated into the training procedure by modifying the environment using network physics for enhanced agent learning. Further, a parallel training approach on multiple scenarios is employed to avoid biasing the agent to a few scenarios and make it robust to the natural variability in grid operations. Without these modifications to the training procedure, the RL agent failed for most test scenarios, illustrating the importance of properly integrating domain knowledge of physical systems for real-world RL learning. The agent was tested by RTE for the 2019 learning to run the power network challenge and was awarded the 2nd place in accuracy and 1st place in speed. The developed code is open-sourced for public use. Analysis of a simple system proves the enhancement in training RL-agents using the curriculum.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE Transactions on Power Systems
Page Range / eLocation ID:
1 to 15
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we consider multi-agent deep reinforcement learning (deep RL) based network slicing agents in a dynamic environment with multiple base stations and multiple users. We develop a deep RL based jammer with limited prior information and limited power budget. The goal of the jammer is to minimize the transmission rates achieved with network slicing and thus degrade the network slicing agents' performance. We design a jammer with both listening and jamming phases and address jamming location optimization as well as jamming channel optimization via deep RL. We evaluate the jammer at the optimized location, generating interference attacks in the optimized set of channels by switching between the jamming phase and listening phase. We show that the proposed jammer can significantly reduce the victims' performance without direct feedback or prior knowledge on the network slicing policies. 
    more » « less
  2. Perimeter metering control has long been an active research topic since well-defined relationships between network productivity and usage, that is, network macroscopic fundamental diagrams (MFDs), were shown to be capable of describing regional traffic dynamics. Numerous methods have been proposed to solve perimeter metering control problems, but these generally require knowledge of the MFDs or detailed equations that govern traffic dynamics. Recently, a study applied model-free deep reinforcement learning (Deep-RL) methods to two-region perimeter control and found comparable performances to the model predictive control scheme, particularly when uncertainty exists. However, the proposed methods therein provide very low initial performances during the learning process, which limits their applicability to real life scenarios. Furthermore, the methods may not be scalable to more complicated networks with larger state and action spaces. To combat these issues, this paper proposes to integrate the domain control knowledge (DCK) of congestion dynamics into the agent designs for improved learning and control performances. A novel agent is also developed that builds on the Bang-Bang control policy. Two types of DCK are then presented to provide knowledge-guided exploration strategies for the agents such that they can explore around the most rewarding part of the action spaces. The results from extensive numerical experiments on two- and three-region urban networks show that integrating DCK can (a) effectively improve learning and control performances for Deep-RL agents, (b) enhance the agents’ resilience against various types of environment uncertainties, and (c) mitigate the scalability issue for the agents. 
    more » « less
  3. Network planning is critical to the performance, reliability and cost of web services. This problem is typically formulated as an Integer Linear Programming (ILP) problem. Today's practice relies on hand-tuned heuristics from human experts to address the scalability challenge of ILP solvers. In this paper, we propose NeuroPlan, a deep reinforcement learning (RL) approach to solve the network planning problem. This problem involves multi-step decision making and cost minimization, which can be naturally cast as a deep RL problem. We develop two important domain-specific techniques. First, we use a graph neural network (GNN) and a novel domain-specific node-link transformation for state encoding, in order to handle the dynamic nature of the evolving network topology during planning decision making. Second, we leverage a two-stage hybrid approach that first uses deep RL to prune the search space and then uses an ILP solver to find the optimal solution. This approach resembles today's practice, but avoids human experts with an RL agent in the first stage. Evaluation on real topologies and setups from large production networks demonstrates that NeuroPlan scales to large topologies beyond the capability of ILP solvers, and reduces the cost by up to 17% compared to hand-tuned heuristics. 
    more » « less
  4. Modern deep learning systems rely on (a) a handtuned neural network topology, (b) massive amounts of labelled training data, and (c) extensive training over large-scale compute resources to build a system that can perform efficient image classification or speech recognition. Unfortunately, we are still far away from implementing adaptive general purpose intelligent systems which would need to learn autonomously in unknown environments and may not have access to some or any of these three components. Reinforcement learning and evolutionary algorithm (EA) based methods circumvent this problem by continuously interacting with the environment and updating the models based on obtained rewards. However, deploying these algorithms on ubiquitous autonomous agents at the edge (robots/drones) demands extremely high energy-efficiency due to (i) tight power and energy budgets, (ii) continuous / lifelong interaction with the environment, (iii) intermittent or no connectivity to the cloud to offloadheavy-weight processing. To address this need, we present GENESYS, a HW-SW prototype of a EA-based learning system, that comprises of a closed loop learning engine called EvE and an inference engine called ADAM. EvE can evolve the topology and weights of neural networks completely in hardware for the task at hand, without requiring hand-optimization or backpropogation training. ADAM continuously interacts with the environment and is optimized for efficiently running the irregular neural networks generated by EvE. GENESYS identifies and leverages multiple unique avenues of parallelism unique to EAs that we term “gene”- level parallelism, and “population”-level parallelism. We ran GENESYS with a suite of environments from OpenAI gym and observed 2-5 orders of magnitude higher energy-efficiency over state-of-the-art embedded and desktop CPU and GPU systems. 
    more » « less
  5. Abstract

    Shared control of mobile robots integrates manual input with auxiliary autonomous controllers to improve the overall system performance. However, prior work that seeks to find the optimal shared control ratio needs an accurate human model, which is usually challenging to obtain. In this study, the authors develop an extended Twin Delayed Deep Deterministic Policy Gradient (DDPG) (TD3X)‐based shared control framework that learns to assist a human operator in teleoperating mobile robots optimally. The robot's states, shared control ratio in the previous time step, and human's control input is used as inputs to the reinforcement learning (RL) agent, which then outputs the optimal shared control ratio between human input and autonomous controllers without knowing the human model. Noisy softmax policies are developed to make the TD3X algorithm feasible under the constraint of a shared control ratio. Furthermore, to accelerate the training process and protect the robot, a navigation demonstration policy and a safety guard are developed. A neural network (NN) structure is developed to maintain the correlation of sensor readings among heterogeneous input data and improve the learning speed. In addition, an extended DAGGER (DAGGERX) human agent is developed for training the RL agent to reduce human workload. Robot simulations and experiments with humans in the loop are conducted. The results show that the DAGGERX human agent can simulate real human inputs in the worst‐case scenarios with a mean square error of 0.0039. Compared to the original TD3 agent, the TD3X‐based shared control system decreased the average collision number from 387.3 to 44.4 in a simplistic environment and 394.2 to 171.2 in a more complex environment. The maximum average return increased from 1043 to 1187 with a faster converge speed in the simplistic environment, while the performance is equally good in the complex environment because of the use of an advanced human agent. In the human subject tests, participants' average perceived workload was significantly lower in shared control than that in exclusively manual control (26.90 vs. 40.07,p = 0.013).

    more » « less