In this paper, we address the problem of model-free optimal output regulation of discrete-time systems that aims at achieving asymptotic tracking and disturbance rejection when we have no exact knowledge of the system parameters. Insights from reinforcement learning and adaptive dynamic programming are used to solve this problem. An interesting discovery is that the model-free discrete-time output regulation differs from the continuous-time counterpart in terms of the persistent excitation condition required to ensure the uniqueness and convergence of the policy iteration. In this work, it is shown that this persistent excitation condition must be carefully established in order to ensure the uniqueness and convergence properties of the policy iteration.
more »
« less
Adaptive Optimal Output Regulation of Discrete-Time Linear Systems: A Reinforcement Learning Approach
In this paper, we solve the optimal output regulation of discrete-time systems without precise knowledge of the system model. Drawing inspiration from reinforcement learning and adaptive dynamic programming, a data-driven solution is developed that enables asymptotic tracking and disturbance rejection. Notably, it is discovered that the proposed approach for discrete-time output regulation differs from the continuous-time approach in terms of the persistent excitation condition required for policy iteration to be unique and convergent. To address this issue, a new persistent excitation condition is introduced to ensure both uniqueness and convergence of the data-driven policy iteration. The efficacy of the proposed methodology is validated by an inverted pendulum on a cart example.
more »
« less
- Award ID(s):
- 2210320
- PAR ID:
- 10513244
- Publisher / Repository:
- IEEE
- Date Published:
- ISBN:
- 979-8-3503-0124-3
- Page Range / eLocation ID:
- 7950 to 7955
- Subject(s) / Keyword(s):
- Learning-based control adaptive optimal control output regulation
- Format(s):
- Medium: X
- Location:
- Singapore, Singapore
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In this paper, we study the robustness property of policy optimization (particularly Gauss–Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning) subject to noise at each iteration. By invoking the concept of input-to-state stability and utilizing Lyapunov’s direct method, it is shown that, if the noise is sufficiently small, the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration. Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided. Based on Willems’ fundamental lemma, a learning-based policy iteration algorithm is proposed. The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal. The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration. Several numerical simulations are conducted to demonstrate the efficacy of the proposed method.more » « less
-
Prior work on automatic control synthesis for cyberphysical systems under logical constraints has primarily focused on environmental disturbances or modeling uncertainties, however, the impact of deliberate and malicious attacks has been less studied. In this paper, we consider a discrete-time dynamical system with a linear temporal logic (LTL) constraint in the presence of an adversary, which is modeled as a stochastic game. We assume that the adversary observes the control policy before choosing an attack strategy. We investigate two problems. In the first problem, we synthesize a robust control policy for the stochastic game that maximizes the probability of satisfying the LTL constraint. A value iteration based algorithm is proposed to compute the optimal control policy. In the second problem, we focus on a subclass of LTL constraints, which consist of an arbitrary LTL formula and an invariant constraint. We then investigate the problem of computing a control policy that minimizes the expected number of invariant constraint violations while maximizing the probability of satisfying the arbitrary LTL constraint. We characterize the optimality condition for the desired control policy. A policy iteration based algorithm is proposed to compute the control policy. We illustrate the proposed approaches using two numerical case studies.more » « less
-
This paper studies the problem of data-driven combined longitudinal and lateral control of autonomous vehicles (AVs) such that the AV can stay within a safe but minimum distance from its leading vehicle and, at the same time, in the lane. Most of the existing methods for combined longitudinal and lateral control are either model-based or developed by purely data-driven methods such as reinforcement learning. Traditional model-based control approaches are insufficient to address the adaptive optimal control design issue for AVs in dynamically changing environments and subject to model uncertainty. Moreover, the conventional reinforcement learning approaches require a large volume of data, and cannot guarantee the stability of the vehicle. These limitations are addressed by integrating the advanced control theory with reinforcement learning techniques. To be more specific, by utilizing adaptive dynamic programming techniques and using the motion data collected from the vehicles, a policy iteration algorithm is proposed such that the control policy is iteratively optimized in the absence of the precise knowledge of the AV’s dynamical model. Furthermore, the stability of the AV is guaranteed with the control policy generated at each iteration of the algorithm. The efficiency of the proposed approach is validated by SUMO simulation, a microscopic traffic simulation platform, for different traffic scenarios.more » « less
-
Summary In this paper, we develop an adaptive control algorithm for addressing security for a class of networked vehicles that comprise a formation ofhuman‐driven vehicles sharing kinematic data and an autonomous vehicle in the aft of the vehicle formation receiving data from the preceding vehicles through wireless vehicle‐to‐vehicle communication devices. Specifically, we develop an adaptive controller for mitigating time‐invariant state‐dependent adversarial sensor and actuator attacks while guaranteeing uniform ultimate boundedness of the closed‐loop networked system. Furthermore, an adaptive learning framework is presented for identifying the state space model parameters based on input‐output data. This learning technique utilizes previously stored data as well as current data to identify the system parameters using a relaxed persistence of excitation condition. The effectiveness of the proposed approach is demonstrated by an illustrative numerical example involving a platoon of connected vehicles.more » « less
An official website of the United States government

