Investigations on the Influence of Model Accuracy in Deep Reinforcement Learning Control for HVAC Applications

Guo, M; Fu, Y; Liu, M; O’Neill, Z

Recent research has highlighted the effectiveness of advanced building controls in reducing the energy consumption of heating, ventilation, and air-conditioning (HVAC) systems. Among advanced building control strategies, deep reinforcement learning control (DRL) shows the potential to achieve energy savings for HVAC systems and has emerged as a promising strategy. However, training DRL requires an interactive environment for the agent, which is challenging to achieve with real buildings due to time and response speed constraints. To address this challenge, a simulation environment serving as a training environment is needed, even though the DRL algorithm does not necessarily need a model. The error between the model and the real building is inevitable in this process, which may influence the efficiency of the DRL controller. To investigate the impact of model error, a virtual testbed was established. A high- fidelity Modelica-based model is developed serving as the virtual building. Three reduced-order models (ROMs) (i.e., 3R2C, Light Gradient Boosting Machine (LightGBM) and artificial neural network (ANN) models) were trained with the historical data generated from the virtual building and were embedded in the training environments of DRL. The sensitivity of ROMs and the Modelica model to random and periodical actions were tested and compared. Deploying the policy trained based on a ROM-based environment, which stands for a surrogate model in reality, into the Modelica-based virtual building testing environment, which stands for real-building, is a practical approach to implementing the DRL control. The performance of the practical DRL controller is compared with rule-based control (RBC) and an ideal DRL controller which was trained and deployed both in the virtual building environment. In the final episode with best rewards of the case study, the 3R2C, LightGBM, and ANN-based DRL outperform the RBC by 7.4%, 14.4%, and 11.4%, respectively in terms of the reward, comprising the weighted sum of energy cost, temperature violations, and the slew rate of the control signal, but falls short of the ideal Modelica-based DRL controller which outperforms RBC by 29.5%. The DRL controllers based on data-driven models are highly unstable with higher maximum rewards but much lower average rewards which might be caused by the significant prediction defect in certain action regions of the data-driven model.

More Like this