skip to main content


Title: Hybrid Reinforcement Learning based controller for autonomous navigation
Safe operations of autonomous mobile robots in close proximity to humans, creates a need for enhanced trajectory tracking (with low tracking errors). Linear optimal control techniques such as Linear Quadratic Regulator (LQR) and Model Predictive Control (MPC) have been used successfully for low-speed applications while leveraging their model-based methodology with manageable computational demands. However, model and parameter uncertainties or other unmodeled nonlinearities may cause poor control actions and constraint violations. Nonlinear MPC has emerged as an alternate optimal-control approach but needs to overcome real-time deployment challenges (including fast sampling time, design complexity, and limited computational resources). In recent years, the optimal control-based deployments have benefitted enormously from the ability of Deep Neural Networks (DNNs) to serve as universal function approximators. This has led to deployments in a plethora of previously inaccessible applications – but many aspects of generalizability, benchmarking, and systematic verification and validation coupled with benchmarking have emerged. This paper presents a novel approach to fusing Deep Reinforcement Learning-based (DRL) longitudinal control with a traditional PID lateral controller for autonomous navigation. Our approach follows (i) Generation of an adequate fidelity simulation scenario via a Real2Sim approach; (ii) training a DRL agent within this framework; (iii) Testing the performance and generalizability on alternate scenarios. We use an initial tuned set of the lateral PID controller gains for observing the vehicle response over a range of velocities. Then we use a DRL framework to generate policies for an optimal longitudinal controller that successfully complements the lateral PID to give the best tracking performance for the vehicle.  more » « less
Award ID(s):
1925500 1939058
NSF-PAR ID:
10357358
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2022 IEEE 95th Vehicular Technology Conference, VTC2022-Spring
Page Range / eLocation ID:
1-6
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Model predictive control (MPC) has become more relevant to vehicle dynamics control due to its inherent capacity of treating system constraints. However, online optimization from MPC introduces an extensive computational burden for today’s onboard microprocessors. To alleviate MPC computational load, several methods have been proposed. Among them, online successive system linearization and the resulting linear time-varying model predictive controller (LTVMPC) is one of the most popular options. Nevertheless, such online successive linearization commonly approximates the original (nonlinear) system by a linear one, which inevitably introduces extra modeling errors and therefore reduces MPC performance. Actually, if the controlled system possesses the “differential flatness” property, then it can be exactly linearized and an equivalent linear model will appear. This linear model maintains all the nonlinear features of the original system and can be utilized to design a flatness-based model predictive controller (FMPC). CarSim-Simulink joint simulations demonstrate that the proposed FMPC substantially outperforms a classical LTVMPC in terms of the path-tracking performance for autonomous vehicles. 
    more » « less
  2. Autonomous vehicle trajectory tracking control is challenged by situations of varying road surface friction, especially in the scenario where there is a sudden decrease in friction in an area with high road curvature. If the situation is unknown to the control law, vehicles with high speed are more likely to lose tracking performance and/or stability, resulting in loss of control or the vehicle departing the lane unexpectedly. However, with connectivity either to other vehicles, infrastructure, or cloud services, vehicles may have access to upcoming roadway information, particularly the friction and curvature in the road path ahead. This paper introduces a model-based predictive trajectory-tracking control structure using the previewed knowledge of path curvature and road friction. In the structure, path following and vehicle stabilization are incorporated through a model predictive controller. Meanwhile, long-range vehicle speed planning and tracking control are integrated to ensure the vehicle can slow down appropriately before encountering hazardous road conditions. This approach has two major advantages. First, the prior knowledge of the desired path is explicitly incorporated into the computation of control inputs. Second, the combined transmission of longitudinal and lateral tire forces is considered in the controller to avoid violation of tire force limits while keeping performance and stability guarantees. The efficacy of the algorithm is demonstrated through an application case where a vehicle navigates a sharply curving road with varying friction conditions, with results showing that the controller can drive a vehicle up to the handling limits and track the desired trajectory accurately. 
    more » « less
  3. null (Ed.)
    The Current practice of air-fuel ratio control relies on empirical models and traditional PID controllers, which require extensive calibration to maintain the post-catalyst air-fuel ratio close to stoichiometry. In contrast, this work utilizes a physics-based Three-Way Catalyst (TWC) model to develop a model predictive control (MPC) strategy for air-fuel ratio control based on internal TWC oxygen storage dynamics. In this paper, parameters of the physics-based temperature and oxygen storage models of the TWC are identified using vehicle test data for a catalyst aged to 150,000 miles. A linearized oxygen storage model is then developed from the identified nonlinear model, which is shown via simulation to follow the nonlinear model with minimal error during nominal operation. This motivates the development of a Linear MPC (LMPC) framework using the linearized TWC oxygen storage model, reducing the requisite computational effort relative to a nonlinear MPC strategy. In this work, the LMPC utilizing a linearized physics-based TWC model is proven suitable for tracking a desired oxygen storage level by controlling the commanded engine air-fuel ratio, which is also a novel contribution. The offline simulation results show successful tracking performance of the developed LMPC framework. 
    more » « less
  4. This paper presents four data-driven system models for a magnetically controlled swimmer. The models were derived directly from experimental data, and the accuracy of the models was experimentally demonstrated. Our previous study successfully implemented two non-model-based control algorithms for 3D path-following using PID and model reference adaptive controller (MRAC). This paper focuses on system identification using only experimental data and a model-based control strategy. Four system models were derived: (1) a physical estimation model, (2, 3) Sparse Identification of Nonlinear Dynamics (SINDY), linear system and nonlinear system, and (4) multilayer perceptron (MLP). All four system models were implemented as an estimator of a multi-step Kalman filter. The maximum required sensing interval was increased from 180 ms to 420 ms and the respective tracking error decreased from 9 mm to 4.6 mm. Finally, a Model Predictive Controller (MPC) implementing the linear SINDY model was tested for 3D path-following and shown to be computationally efficient and offers performances comparable to other control methods. 
    more » « less
  5. Stop-and-go traffic poses significant challenges to the efficiency and safety of traffic operations, and its impacts and working mechanism have attracted much attention. Recent studies have shown that Connected and Automated Vehicles (CAVs) with carefully designed longitudinal control have the potential to dampen the stop-and-go wave based on simulated vehicle trajectories. In this study, Deep Reinforcement Learning (DRL) is adopted to control the longitudinal behavior of CAVs and real-world vehicle trajectory data is utilized to train the DRL controller. It considers a Human-Driven (HD) vehicle tailed by a CAV, which are then followed by a platoon of HD vehicles. Such an experimental design is to test how the CAV can help to dampen the stop-and-go wave generated by the lead HD vehicle and contribute to smoothing the following HD vehicles’ speed profiles. The DRL control is trained using real-world vehicle trajectories, and eventually evaluated using SUMO simulation. The results show that the DRL control decreases the speed oscillation of the CAV by 54% and 8%-28% for those following HD vehicles. Significant fuel consumption savings are also observed. Additionally, the results suggest that CAVs may act as a traffic stabilizer if they choose to behave slightly altruistically. 
    more » « less