skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Adaptive Optimal Control of Linear Periodic Systems: An Off-Policy Value Iteration Approach
This paper studies the infinite-horizon adaptive optimal control of continuous-time linear periodic (CTLP) systems. A novel value iteration (VI) based off-policy ADP algorithm is proposed for a general class of CTLP systems, so that approximate optimal solutions can be obtained directly from the collected data, without the exact knowledge of system dynamics. Under mild conditions, the proofs on uniform convergence of the proposed algorithm to the optimal solutions are given for both the model-based and model-free cases. The VI-based ADP algorithm is able to find suboptimal controllers without assuming the knowledge of an initial stabilizing controller. Application to the optimal control of a triple inverted pendulum subjected to a periodically varying load demonstrates the feasibility and effectiveness of the proposed method.  more » « less
Award ID(s):
1903781
PAR ID:
10158684
Author(s) / Creator(s):
;
Date Published:
Journal Name:
IEEE Transactions on Automatic Control
ISSN:
0018-9286
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper proposes a novel learning-based adaptive optimal controller design method for a class of continuous-time linear time-delay systems. A key strategy is to exploit the state-of-the-art reinforcement learning (RL) techniques and adaptive dynamic programming (ADP), and propose a data-driven method to learn the near-optimal controller without the precise knowledge of system dynamics. Specifically, a value iteration (VI) algorithm is proposed to solve the infinite-dimensional Riccati equation for the linear quadratic optimal control problem of time-delay systems using finite samples of input-state trajectory data. It is rigorously proved that the proposed VI algorithm converges to the near-optimal solution. Compared with the previous literature, the nice features of the proposed VI algorithm are that it is directly developed for continuous-time systems without discretization and an initial admissible controller is not required for implementing the algorithm. The efficacy of the proposed methodology is demonstrated by two practical examples of metal cutting and autonomous driving. 
    more » « less
  2. This paper presents a first solution to the problem of adaptive LQR for continuous-time linear periodic systems. Specifically, reinforcement learning and adaptive dynamic programming (ADP) techniques are used to develop two algorithms to obtain near-optimal controllers. Firstly, the policy iteration (PI) and value iteration (VI) methods are proposed when the model is known. Then, PI-based and VI-based off-policy ADP algorithms are derived to find near-optimal solutions directly from input/state data collected along the system trajectories, without the exact knowledge of system dynamics. The effectiveness of the derived algorithms is validated using the well-known lossy Mathieu equation. 
    more » « less
  3. This paper studies the learning-based optimal control for a class of infinite-dimensional linear time-delay systems. The aim is to fill the gap of adaptive dynamic programming (ADP) where adaptive optimal control of infinite-dimensional systems is not addressed. A key strategy is to combine the classical model-based linear quadratic (LQ) optimal control of time-delay systems with the state-of-art reinforcement learning (RL) technique. Both the model-based and data-driven policy iteration (PI) approaches are proposed to solve the corresponding algebraic Riccati equation (ARE) with guaranteed convergence. The proposed PI algorithm can be considered as a generalization of ADP to infinite-dimensional time-delay systems. The efficiency of the proposed algorithm is demonstrated by the practical application arising from autonomous driving in mixed traffic environments, where human drivers’ reaction delay is considered. 
    more » « less
  4. This paper studies the adaptive optimal control problem for a class of linear time-delay systems described by delay differential equations (DDEs). A crucial strategy is to take advantage of recent developments in reinforcement learning (RL) and adaptive dynamic programming (ADP) and develop novel methods to learn adaptive optimal controllers from finite samples of input and state data. In this paper, the data-driven policy iteration (PI) is proposed to solve the infinite-dimensional algebraic Riccati equation (ARE) iteratively in the absence of exact model knowledge. Interestingly, the proposed recursive PI algorithm is new in the present context of continuous-time time-delay systems, even when the model knowledge is assumed known. The efficacy of the proposed learning-based control methods is validated by means of practical applications arising from metal cutting and autonomous driving. 
    more » « less
  5. This paper presents a unified approach to the problem of learning-based optimal control of connected human-driven and autonomous vehicles in mixed-traffic environments including both the freeway and ring road settings. The stabilizability of a string of connected vehicles including multiple autonomous vehicles (AVs) and heterogeneous human-driven vehicles (HDVs) is studied by a model reduction technique and the Popov-Belevitch-Hautus (PBH) test. For this problem setup, a linear quadratic regulator (LQR) problem is formulated and a solution based on adaptive dynamic programming (ADP) techniques is proposed without a priori knowledge on model parameters. To start the learning process, an initial stabilizing control law is obtained using the small-gain theorem for the ring road case. It is shown that the obtained stabilizing control law can achieve general Lp string stability under appropriate conditions. Besides, to minimize the impact of external disturbance, a linear quadratic zero-sum game is introduced and solved by an iterative learning-based algorithm. Finally, the simulation results verify the theoretical analysis and the proposed methods achieve desirable performance for control of a mixed-vehicular network. 
    more » « less