This paper presents a first solution to the problem of adaptive LQR for continuous-time linear periodic systems. Specifically, reinforcement learning and adaptive dynamic programming (ADP) techniques are used to develop two algorithms to obtain near-optimal controllers. Firstly, the policy iteration (PI) and value iteration (VI) methods are proposed when the model is known. Then, PI-based and VI-based off-policy ADP algorithms are derived to find near-optimal solutions directly from input/state data collected along the system trajectories, without the exact knowledge of system dynamics. The effectiveness of the derived algorithms is validated using the well-known lossy Mathieu equation.
more »
« less
Adaptive Optimal Control of Linear Periodic Systems: An Off-Policy Value Iteration Approach
This paper studies the infinite-horizon adaptive optimal
control of continuous-time linear periodic (CTLP) systems.
A novel value iteration (VI) based off-policy ADP algorithm
is proposed for a general class of CTLP systems, so that
approximate optimal solutions can be obtained directly from the
collected data, without the exact knowledge of system dynamics.
Under mild conditions, the proofs on uniform convergence of
the proposed algorithm to the optimal solutions are given for
both the model-based and model-free cases. The VI-based ADP
algorithm is able to find suboptimal controllers without assuming
the knowledge of an initial stabilizing controller. Application
to the optimal control of a triple inverted pendulum subjected
to a periodically varying load demonstrates the feasibility and
effectiveness of the proposed method.
more »
« less
- Award ID(s):
- 1903781
- PAR ID:
- 10158684
- Date Published:
- Journal Name:
- IEEE Transactions on Automatic Control
- ISSN:
- 0018-9286
- Page Range / eLocation ID:
- 1 to 1
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper studies the learning-based optimal control for a class of infinite-dimensional linear time-delay systems. The aim is to fill the gap of adaptive dynamic programming (ADP) where adaptive optimal control of infinite-dimensional systems is not addressed. A key strategy is to combine the classical model-based linear quadratic (LQ) optimal control of time-delay systems with the state-of-art reinforcement learning (RL) technique. Both the model-based and data-driven policy iteration (PI) approaches are proposed to solve the corresponding algebraic Riccati equation (ARE) with guaranteed convergence. The proposed PI algorithm can be considered as a generalization of ADP to infinite-dimensional time-delay systems. The efficiency of the proposed algorithm is demonstrated by the practical application arising from autonomous driving in mixed traffic environments, where human drivers’ reaction delay is considered.more » « less
-
This paper studies the adaptive optimal control problem for a class of linear time-delay systems described by delay differential equations (DDEs). A crucial strategy is to take advantage of recent developments in reinforcement learning (RL) and adaptive dynamic programming (ADP) and develop novel methods to learn adaptive optimal controllers from finite samples of input and state data. In this paper, the data-driven policy iteration (PI) is proposed to solve the infinite-dimensional algebraic Riccati equation (ARE) iteratively in the absence of exact model knowledge. Interestingly, the proposed recursive PI algorithm is new in the present context of continuous-time time-delay systems, even when the model knowledge is assumed known. The efficacy of the proposed learning-based control methods is validated by means of practical applications arising from metal cutting and autonomous driving.more » « less
-
This paper presents a novel learning-based adaptive optimal controller design for linear time-delay systems described by delay differential equations (DDEs). A key strategy is to exploit the value iteration (VI) approach to solve the linear quadratic optimal control problem for time-delay systems. However, previous learning-based control methods are all exclusively devoted to discrete-time time-delay systems. In this article, we aim to fill in the gap by developing a learning-based VI approach to solve the infinite-dimensional algebraic Riccati equation (ARE) for continuous-time time-delay systems. One nice feature of the proposed VI approach is that an initial admissible controller is not required to start the algorithm. The efficacy of the proposed methodology is demonstrated by the example of autonomous driving.more » « less
-
This paper presents a unified approach to the problem of learning-based optimal control of connected human-driven and autonomous vehicles in mixed-traffic environments including both the freeway and ring road settings. The stabilizability of a string of connected vehicles including multiple autonomous vehicles (AVs) and heterogeneous human-driven vehicles (HDVs) is studied by a model reduction technique and the Popov-Belevitch-Hautus (PBH) test. For this problem setup, a linear quadratic regulator (LQR) problem is formulated and a solution based on adaptive dynamic programming (ADP) techniques is proposed without a priori knowledge on model parameters. To start the learning process, an initial stabilizing control law is obtained using the small-gain theorem for the ring road case. It is shown that the obtained stabilizing control law can achieve general Lp string stability under appropriate conditions. Besides, to minimize the impact of external disturbance, a linear quadratic zero-sum game is introduced and solved by an iterative learning-based algorithm. Finally, the simulation results verify the theoretical analysis and the proposed methods achieve desirable performance for control of a mixed-vehicular network.more » « less