skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, May 17 until 8:00 AM ET on Saturday, May 18 due to maintenance. We apologize for the inconvenience.

This content will become publicly available on June 26, 2024

Title: Bridging Transient and Steady-State Performance in Voltage Control: A Reinforcement Learning Approach With Safe Gradient Flow
Deep reinforcement learning approaches are becoming appealing for the design of nonlinear controllers for voltage control problems, but the lack of stability guarantees hinders their real-world deployment. This letter constructs a decentralized RL-based controller for inverter-based real-time voltage control in distribution systems. It features two components: a transient control policy and a steady-state performance optimizer. The transient policy is parameterized as a neural network, and the steady-state optimizer represents the gradient of the long-term operating cost function. The two parts are synthesized through a safe gradient flow framework, which prevents the violation of reactive power capacity constraints. We prove that if the output of the transient controller is bounded and monotonically decreasing with respect to its input, then the closed-loop system is asymptotically stable and converges to the optimal steady-state solution. We demonstrate the effectiveness of our method by conducting experiments with IEEE 13-bus and 123-bus distribution system test feeders.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Date Published:
Journal Name:
IEEE Control Systems Letters
Page Range / eLocation ID:
2845 to 2850
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this work, we investigate grid-forming control for power systems containing three-phase and single-phase converters connected to unbalanced distribution and transmission networks, investigate self-balancing between single-phase converters, and propose a novel balancing feedback for grid-forming control that explicitly allows to trade-off unbalances in voltage and power. We develop a quasi-steady-state power network model that allows to analyze the interactions between three-phase and single-phase power converters across transmission, distribution, and standard transformer interconnections. We first investigate conditions under which this general network admits a well-posed kron-reduced quasi-steady-state network model. Our main contribution leverages this reduced-order model to develop analytical conditions for stability of the overall network with grid-forming three-phase and single-phase converters connected through standard transformer interconnections. Specifically, we provide conditions on the network topology under which (i) single-phase converters autonomously self-synchronize to a phase-balanced operating point and (ii) single-phase converters phase-balance through synchronization with three-phase converters. Moreover, we establish that the conditions can be relaxed if a phase-balancing feedback control is used. Finally, case studies combining detailed models of transmission systems (i.e., IEEE 9-bus) and distribution systems (i.e., IEEE 13-bus) are used to illustrate the results for (i) a power system containing a mix of transmission and distribution connected converters and, (ii) a power system solely using distribution-connected converters at the grid edge. 
    more » « less
  2. null (Ed.)
    Pronounced variability due to the growth of renewable energy sources, flexible loads, and distributed generation is challenging residential distribution systems. This context, motivates well fast, efficient, and robust reactive power control. Optimal reactive power control is possible in theory by solving a non-convex optimization problem based on the exact model of distribution flow. However, lack of high-precision instrumentation and reliable communications, as well as the heavy computational burden of non-convex optimization solvers render computing and implementing the optimal control challenging in practice. Taking a statistical learning viewpoint, the input-output relationship between each grid state and the corresponding optimal reactive power control (a.k.a., policy) is parameterized in the present work by a deep neural network, whose unknown weights are updated by minimizing the accumulated power loss over a number of historical and simulated training pairs, using the policy gradient method. In the inference phase, one just feeds the real-time state vector into the learned neural network to obtain the ‘optimal’ reactive power control decision with only several matrix-vector multiplications. The merits of this novel deep policy gradient approach include its computational efficiency as well as robustness to random input perturbations. Numerical tests on a 47-bus distribution network using real solar and consumption data corroborate these practical merits. 
    more » « less
  3. Learning a dynamical system requires stabilizing the unknown dynamics to avoid state blow-ups. However, the standard reinforcement learning (RL) methods lack formal stabilization guarantees, which limits their applicability for the control of real-world dynamical systems. We propose a novel policy optimization method that adopts Krasovskii's family of Lyapunov functions as a stability constraint. We show that solving this stability-constrained optimization problem using a primal-dual approach recovers a stabilizing policy for the underlying system even under modeling error. Combining this method with model learning, we propose a model-based RL framework with formal stability guarantees, Krasovskii-Constrained Reinforcement Learning (KCRL). We theoretically study KCRL with kernel-based feature representation in model learning and provide a sample complexity guarantee to learn a stabilizing controller for the underlying system. Further, we empirically demonstrate the effectiveness of KCRL in learning stabilizing policies in online voltage control of a distributed power system. We show that KCRL stabilizes the system under various real-world solar and electricity demand profiles, whereas standard RL methods often fail to stabilize. 
    more » « less
  4. A control system for bipedal walking in the sagittal plane was developed in simulation. The biped model was built based on anthropometric data for a 1.8 m tall male of average build. At the core of the controller is a deep deterministic policy gradient (DDPG) neural network that was trained in GAZEBO, a physics simulator, to predict the ideal foot placement to maintain stable walking despite external disturbances. The complexity of the DDPG network was decreased through carefully selected state variables and a distributed control system. Additional controllers for the hip joints during their stance phases and the ankle joint during toe-off phase help to stabilize the biped during walking. The simulated biped can walk at a steady pace of approximately 1 m/s, and during locomotion it can maintain stability with a 30 kg·m/s impulse applied forward on the torso or a 40 kg·m/s impulse applied rearward. It also maintains stable walking with a 10 kg backpack or a 25 kg front pack. The controller was trained on a 1.8 m tall model, but also stabilizes models 1.4–2.3 m tall with no changes. 
    more » « less
  5. This paper assesses the stability improvements that can be achieved through the coordinated wide-area control of power system stabilizers (PSSs), static VAr compensators (SVCs), and supplementary damping controllers (SDCs) for damping low frequency oscillations (LFOs) in a power system embedded with multiple high voltage DC (HVDC) lines. The improved damping is achieved by designing a coordinated widearea damping controller (CWADC) that employs partial state feedback. The design methodology uses a linear matrix inequality (LMI)-based mixed H2=H1 robust control for multiple operating scenarios. To reduce the high computational burden, an enhanced version of selective modal analysis (SMA) is employed that not only reduces the number of required wide-area feedback signals, but also identifies alternate feedback signals. Additionally, the impact of delays on the performance of the control design is investigated. The studies are performed on a 29 machine, 127 bus equivalent model of the Western Electricity Coordinating Council (WECC) system-embedded with three HVDC lines and two wind farms. 
    more » « less