skip to main content


Title: Reinforcement Learning-Based Home Energy Management System for Resiliency
With increase in the frequency of natural disasters such as hurricanes that disrupt the supply from the grid, there is a greater need for resiliency in electric supply. Rooftop solar photovoltaic (PV) panels along with batteries can provide resiliency to a house in a blackout due to a natural disaster. Our previous work showed that intelligence can reduce the size of a PV+battery system for the same level of post-blackout service compared to a conventional system that does not employ intelligent control. The intelligent controller proposed is based on model predictive control (MPC), which has two main challenges. One, it requires simple yet accurate models as it involves real-time optimization. Two, the discrete actuation for residential loads (on/off) makes the underlying optimization problem a mixed-integer program (MIP) which is challenging to solve. An attractive alternative to MPC is reinforcement learning (RL) as the real-time control computation is both model-free and simple. These points of interest accompany certain trade-offs; RL requires computationally expensive offline learning, and its performance is sensitive to various design choices. In this work, we propose an RL-based controller. We compare its performance with the MPC controller proposed in our prior work and a non-intelligent baseline controller. The RL controller is found to provide a resiliency performance — by commanding critical loads and batteries—similar to MPC with a significant reduction in computational effort.  more » « less
Award ID(s):
1934322 1646229
PAR ID:
10281416
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
American Control Conference
Page Range / eLocation ID:
1358 to 1364
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider the problem of optimal control of district cooling energy plants (DCEPs) consisting of multiple chillers, a cooling tower, and a thermal energy storage (TES), in the presence of time-varying electricity price. A straightforward application of model predictive control (MPC) requires solving a challenging mixed-integer nonlinear program (MINLP) because of the on/off of chillers and the complexity of the DCEP model. Reinforcement learning (RL) is an attractive alternative since its real-time control computation is much simpler. But designing an RL controller is challenging due to myriad design choices and computationally intensive training. In this paper, we propose an RL controller and an MPC controller for minimizing the electricity cost of a DCEP and compare them via simulations. The two controllers are designed to be comparable in terms of objective and information requirements. The RL controller uses a novel Q-learning algorithm that is based on least-squares policy iteration. We describe the design choices for the RL controller, including the choice of state space and basis functions, that are found to be effective. The proposed MPC controller does not need a mixed integer solver for implementation, but only a nonlinear program (NLP) solver. A rule-based baseline controller is also proposed to aid in comparison. Simulation results show that the proposed RL and MPC controllers achieve similar savings over the baseline controller, about 17%. 
    more » « less
  2. null (Ed.)
    Abstract Flooding in coastal cities is increasing due to climate change and sea-level rise, stressing the traditional stormwater systems these communities rely on. Automated real-time control (RTC) of these systems can improve performance, and creating control policies for smart stormwater systems is an active area of study. This research explores reinforcement learning (RL) to create control policies to mitigate flood risk. RL is trained using a model of hypothetical urban catchments with a tidal boundary and two retention ponds with controllable valves. RL's performance is compared to the passive system, a model predictive control (MPC) strategy, and a rule-based control strategy (RBC). RL learns to proactively manage pond levels using current and forecast conditions and reduced flooding by 32% over the passive system. Compared to the MPC approach using a physics-based model and genetic algorithm, RL achieved nearly the same flood reduction, just 3% less than MPC, with a significant 88× speedup in runtime. Compared to RBC, RL was able to quickly learn similar control strategies and reduced flooding by an additional 19%. This research demonstrates that RL can effectively control a simple system and offers a computationally efficient method that could scale to RTC of more complex stormwater systems. 
    more » « less
  3. Traffic signal controller (TSC) has a crucial role in managing traffic flow in urban areas. Recently, reinforcement learning (RL) models have received a great attention for TSC with promising results. However, these RL-TSC models still need to be improved for real-world deployment due to limited exploration of different performance metrics such as fair traffic scheduling or air quality impact. In this work, we introduce a constrained multi-objective RL model that minimizes multiple constrained objectives while achieving a higher expected reward. Furthermore, our proposed RL strategy integrates the peak and average constraint models to the RL problem formulation with maximum entropy off-policy models. We applied this strategy to a single TSC and a network of TSCs. As part of this constrained RL-TSC formulation, we discuss fairness and air quality parameters as constraints for the closed-loop control system optimization model at TSCs calledFAirLight. Our experimental analysis shows that the proposedFAirLightachieves a good traffic flow performance in terms of average waiting time while being fair and environmentally friendly. Our method outperforms the baseline models and allows a more comprehensive view of RL-TSC regarding its applicability to the real world.

     
    more » « less
  4. This work proposes a two-degree of freedom (2DOF) controller for motion tracking of nanopositioning devices, such as piezoelectric actuators (PEAs), with a broad bandwidth and high precision. The proposed 2DOF controller consists of an inversion feedforward controller and a real-time feedback controller. The feedforward controller, a sequence-to-sequence LSTM-based inversion model (invLSTMs2s), is used to compensate for the nonlinearity of the PEA, especially at high frequencies, and is collaboratively integrated with a linear MPC feedback controller, which ensures the PEA position tracking performance at low frequencies. Therefore, the proposed 2DOF controller, namely, invLSTMs2s+MPC, is able to achieve high precision over a broad bandwidth. To validate the proposed controller, the uncertainty of invLSTMs2s is checked such that the integration of an inversion model-based feedforward controller has a positive impact on the trajectory tracking performance compared to feedback control only. Experimental validation on a commercial PEA and comparison with existing approaches demonstrate that high tracking accuracies can be achieved by invLSTMs2s+MPC for various reference trajectories. Moreover, invLSTMs2s+MPC is further demonstrated on a multi-dimensional PEA platform for simultaneous multi-direction positioning control.

     
    more » « less
  5. Reinforcement learning (RL) has recently shown promise in solving difficult numerical problems and has discovered non-intuitive solutions to existing problems. This study investigates the ability of a general RL agent to find an optimal control strategy for spacecraft attitude control problems. Two main types of Attitude Control Systems (ACS) are presented. First, the general ACS problem with full actuation is considered, but with saturation constraints on the applied torques, representing thruster-based ACSs. Second, an attitude control problem with reaction wheel based ACS is considered, which has more constraints on control authority. The agent is trained using the Proximal Policy Optimization (PPO) RL method to obtain an attitude control policy. To ensure robustness, the inertia of the satellite is unknown to the control agent and is randomized for each simulation. To achieve efficient learning, the agent is trained using curriculum learning. We compare the RL based controller to a QRF (quaternion rate feedback) attitude controller, a well-established state feedback control strategy. We investigate the nominal performance and robustness with respect to uncertainty in system dynamics. Our RL based attitude control agent adapts to any spacecraft mass without needing to re-train. In the range of 0.1 to 100,000 kg, our agent achieves 2% better performance to a QRF controller tuned for the same mass range, and similar performance to the QRF controller tuned specifically for a given mass. The performance of the trained RL agent for the reaction wheel based ACS achieved 10 higher better reward then that of a tuned QRF controller 
    more » « less