skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Reinforcement Learning-Based Event-Triggered Active-Battery-Cell-Balancing Control for Electric Vehicle Range Extension
Optimal control techniques such as model predictive control (MPC) have been widely studied and successfully applied across a diverse field of applications. However, the large computational requirements for these methods result in a significant challenge for embedded applications. While event-triggered MPC (eMPC) is one solution that could address this issue by taking advantage of the prediction horizon, one obstacle that arises with this approach is that the event-trigger policy is complex to design to fulfill both throughput and control performance requirements. To address this challenge, this paper proposes to design the event trigger by training a deep Q-network reinforcement learning agent (RLeMPC) to learn the optimal event-trigger policy. This control technique was applied to an active-cell-balancing controller for the range extension of an electric vehicle battery. Simulation results with MPC, eMPC, and RLeMPC control policies are presented along with a discussion of the challenges of implementing RLeMPC.  more » « less
Award ID(s):
2237317
PAR ID:
10557937
Author(s) / Creator(s):
; ;
Publisher / Repository:
MDPI
Date Published:
Journal Name:
Electronics
Volume:
13
Issue:
5
ISSN:
2079-9292
Page Range / eLocation ID:
990
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Enhancing grid resilience is proposed through the integration of distributed energy resources (DERs) with microgrids. Due to the diverse nature of DERs, there is a need to explore the optimal combined operation of these energy sources within the framework of microgrids. As such, this paper presents the design, implementation and validation of a Model Predictive Control (MPC)-based secondary control scheme to tackle two challenges: optimal islanded operation, and optimal re-synchronization of a microgrid. The MPC optimization algorithm dynamically adjusts input signals, termed manipulated variables, for each DER within the microgrid, including a gas turbine, an aggregate photovoltaic (PV) unit, and an electrical battery energy storage (BESS) unit. To attain optimal islanded operation, the secondary-level controller based on Model Predictive Control (MPC) was configured to uphold microgrid functionality promptly following the islanding event. Subsequently, it assumed the task of power balancing within the microgrid and ensuring the reliability of the overall system. For optimal re-synchronization, the MPC-based controller was set to adjust the manipulated variables to synchronize voltage and angle with the point of common coupling of the system. All stages within the microgrid operation were optimally achieved through one MPC-driven control system, where the controller can effectively guide the system to different goals by updating the MPC’s target reference. More importantly, the results show that the MPC-based control scheme is capable of controlling different DERs simultaneously, mitigating potentially harmful transient rotor torques from the re-synchronization as well as maintaining the microgrid within system performance requirements. 
    more » « less
  2. null (Ed.)
    Mobile platforms must satisfy the contradictory requirements of fast response time and minimum energy consumption as a function of dynamically changing applications. To address this need, systems-on-chip (SoC) that are at the heart of these devices provide a variety of control knobs, such as the number of active cores and their voltage/frequency levels. Controlling these knobs optimally at runtime is challenging for two reasons. First, the large configuration space prohibits exhaustive solutions. Second, control policies designed offline are at best sub-optimal, since many potential new applications are unknown at design-time. We address these challenges by proposing an online imitation learning approach. Our key idea is to construct an offline policy and adapt it online to new applications to optimize a given metric (e.g., energy). The proposed methodology leverages the supervision enabled by power-performance models learned at runtime. We demonstrate its effectiveness on a commercial mobile platform with 16 diverse benchmarks. Our approach successfully adapts the control policy to an unknown application after executing less than 25% of its instructions. 
    more » « less
  3. We consider the problem of optimal control of district cooling energy plants (DCEPs) consisting of multiple chillers, a cooling tower, and a thermal energy storage (TES), in the presence of time-varying electricity price. A straightforward application of model predictive control (MPC) requires solving a challenging mixed-integer nonlinear program (MINLP) because of the on/off of chillers and the complexity of the DCEP model. Reinforcement learning (RL) is an attractive alternative since its real-time control computation is much simpler. But designing an RL controller is challenging due to myriad design choices and computationally intensive training. In this paper, we propose an RL controller and an MPC controller for minimizing the electricity cost of a DCEP and compare them via simulations. The two controllers are designed to be comparable in terms of objective and information requirements. The RL controller uses a novel Q-learning algorithm that is based on least-squares policy iteration. We describe the design choices for the RL controller, including the choice of state space and basis functions, that are found to be effective. The proposed MPC controller does not need a mixed integer solver for implementation, but only a nonlinear program (NLP) solver. A rule-based baseline controller is also proposed to aid in comparison. Simulation results show that the proposed RL and MPC controllers achieve similar savings over the baseline controller, about 17%. 
    more » « less
  4. Abstract Topology optimization (TO) has rapidly evolved from an academic exercise into an exciting discipline with numerous industrial applications. Various TO algorithms have been established, and several commercial TO software packages are now available. However, a major challenge in TO is the post-processing of the optimized models for downstream applications. Typically, optimal topologies generated by TO are faceted (triangulated) models, extracted from an underlying finite element mesh. These triangulated models are dense, poor quality, and lack feature/parametric control. This poses serious challenges to downstream applications such as prototyping/testing, design validation, and design exploration. One strategy to address this issue is to directly impose downstream requirements as constraints in the TO algorithm. However, this not only restricts the design space, it may even lead to TO failure. Separation of post-processing from TO is more robust and flexible. The objective of this paper is to provide a critical review of various post-processing methods and categorize them based both on targeted applications and underlying strategies. The paper concludes with unresolved challenges and future work. 
    more » « less
  5. The physical design of a robot and the policy that controls its motion are inherently coupled, and should be determined according to the task and environment. In an increasing number of applications, data-driven and learning-based approaches, such as deep reinforcement learning, have proven effective at designing control policies. For most tasks, the only way to evaluate a physical design with respect to such control policies is empirical---i.e., by picking a design and training a control policy for it. Since training these policies is time-consuming, it is computationally infeasible to train separate policies for all possible designs as a means to identify the best one. In this work, we address this limitation by introducing a method that jointly optimizes over the physical design and control network. Our approach maintains a distribution over designs and uses reinforcement learning to optimize a control policy to maximize expected reward over the design distribution. We give the controller access to design parameters to allow it to tailor its policy to each design in the distribution. Throughout training, we shift the distribution towards higher-performing designs, eventually converging to a design and control policy that are jointly optimal. We evaluate our approach in the context of legged locomotion, and demonstrate that it discovers novel designs and walking gaits, outperforming baselines across different settings. 
    more » « less