Hybrid electric vehicles can achieve better fuel economy than conventional vehicles by utilizing multiple power sources. While these power sources have been controlled by rule-based or optimization-based control algorithms, recent studies have shown that machine learning-based control algorithms such as online Deep Reinforcement Learning (DRL) can effectively control the power sources as well. However, the optimization and training processes for the online DRL-based powertrain control strategy can be very time and resource intensive. In this paper, a new offline–online hybrid DRL strategy is presented where offline vehicle data are exploited to build an initial model and an online learning algorithm explores a new control policy to further improve the fuel economy. In this manner, it is expected that the agent can learn an environment consisting of the vehicle dynamics in a given driving condition more quickly compared to the online algorithms, which learn the optimal control policy by interacting with the vehicle model from zero initial knowledge. By incorporating a priori offline knowledge, the simulation results show that the proposed approach not only accelerates the learning process and makes the learning process more stable, but also leads to a better fuel economy compared to online only learning algorithms.
more »
« less
This content will become publicly available on January 25, 2025
Reinforcement learning-based optimal control of uncertain nonlinear systems
This paper considers the optimal control of a second-order nonlinear system with unknown dynamics. A new reinforcement learning based approach is proposed with the aid of direct adaptive control. By the new approach actor-critic reinforcement learning algorithms are proposed with three neural network approximation. Simulation results are presented to show the effectiveness of the proposed algorithms.
more »
« less
- Award ID(s):
- 2037649
- NSF-PAR ID:
- 10538706
- Publisher / Repository:
- Taylor & Francis
- Date Published:
- Journal Name:
- International Journal of Control
- ISSN:
- 0020-7179
- Page Range / eLocation ID:
- 1 to 12
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper investigates data-efficient methods for learning robust control policies. Reinforcement learning has emerged as an effective approach to learn control policies by interacting directly with the plant, but it requires a significant number of example trajectories to converge to the optimal policy. Combining model-free reinforcement learning with model-based control methods achieves better data-efficiency via simultaneous system identification and controller synthesis. We study a novel approach that exploits the existence of approximate physics models to accelerate the learning of control policies. The proposed approach consists of iterating through three key steps: evaluating a selected policy on the real-world plant and recording trajectories, building a Gaussian process model to predict the reality-gap of a parametric physics model in the neighborhood of the selected policy, and synthesizing a new policy using reinforcement learning on the refined physics model that most likely approximates the real plant. The approach converges to an optimal policy as well as an approximate physics model. The real world experiments are limited to evaluating only promising candidate policies, and the use of Gaussian processes minimizes the number of required real world trajectories. We demonstrate the effectiveness of our techniques on a set of simulation case-studies using OpenAI gym environments.more » « less
-
Distributed feedback design and complexity constrained control are examples of problems posed within the domain of structured optimal feedback synthesis. The optimal feedback gain is typically a non-convex function of system primitives. However, in recent years, algorithms have been proposed to obtain locally optimal solutions. In applications to large-scale distributed control, the major obstacle is computational complexity. This paper addresses complexity through a combination of linear-algebraic techniques and computational methods adapted from both machine learning and reinforcement learning. It is shown that for general classes of optimal control problems, the objective function and its gradient can be computed from data. Transformations borrowed from the theory of reinforcement learning are adapted to obtain simulation-based algorithms for computing the structured optimal H2 feedback gain. Customized proximal algorithms based on gradient descent and incremental gradient are tested in computational experiments and their relative merits are discussed.more » « less
-
null (Ed.)In this paper, an Unmanned Aerial Vehicles (UAVs) - enabled human Internet of Things (IoT) architecture is introduced to enable the rescue operations in public safety systems (PSSs). Initially, the first responders select in an autonomous manner the disaster area that they will support by considering the dynamic socio-physical changes of the surrounding environment and following a set of gradient ascent reinforcement learning algorithms. Then, the victims create coalitions among each other and the first responders at each disaster area based on the expected- maximization approach. Finally, the first responders select the UAVs that communicate with the Emergency Control Center (ECC), to which they will report the collected data from the disaster areas by adopting a set of log-linear reinforcement learning algorithms. The overall distributed UAV-enabled human Internet of Things architecture is evaluated via detailed numerical results that highlight its key operational features and the performance benefits of the proposed framework.more » « less
-
Offline policy optimization could have a large impact on many real-world decision-making problems, as online learning may be infeasible in many applications. Importance sampling and its variants are a commonly used type of estimator in offline policy evaluation, and such estimators typically do not require assumptions on the properties and representational capabilities of value function or decision process model function classes. In this paper, we identify an important overfitting phenomenon in optimizing the importance weighted return, in which it may be possible for the learned policy to essentially avoid making aligned decisions for part of the initial state space. We propose an algorithm to avoid this overfitting through a new per-state-neighborhood normalization constraint, and provide a theoretical justification of the proposed algorithm. We also show the limitations of previous attempts to this approach. We test our algorithm in a healthcare-inspired simulator, a logged dataset collected from real hospitals and continuous control tasks. These experiments show the proposed method yields less overfitting and better test performance compared to state-of-the-art batch reinforcement learning algorithms.more » « less