skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning-Based Adaptive Optimal Output Regulation of Discrete-Time Linear Systems
In this paper, we address the problem of model-free optimal output regulation of discrete-time systems that aims at achieving asymptotic tracking and disturbance rejection when we have no exact knowledge of the system parameters. Insights from reinforcement learning and adaptive dynamic programming are used to solve this problem. An interesting discovery is that the model-free discrete-time output regulation differs from the continuous-time counterpart in terms of the persistent excitation condition required to ensure the uniqueness and convergence of the policy iteration. In this work, it is shown that this persistent excitation condition must be carefully established in order to ensure the uniqueness and convergence properties of the policy iteration.  more » « less
Award ID(s):
1903781
PAR ID:
10513240
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Elsevier
Date Published:
Journal Name:
IFAC-PapersOnLine
Volume:
56
Issue:
2
ISSN:
2405-8963
Page Range / eLocation ID:
10283 to 10288
Subject(s) / Keyword(s):
Adaptive control, approximate/adaptive dynamic programming, optimal control, discrete-time systems, discrete-time output regulation
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we solve the optimal output regulation of discrete-time systems without precise knowledge of the system model. Drawing inspiration from reinforcement learning and adaptive dynamic programming, a data-driven solution is developed that enables asymptotic tracking and disturbance rejection. Notably, it is discovered that the proposed approach for discrete-time output regulation differs from the continuous-time approach in terms of the persistent excitation condition required for policy iteration to be unique and convergent. To address this issue, a new persistent excitation condition is introduced to ensure both uniqueness and convergence of the data-driven policy iteration. The efficacy of the proposed methodology is validated by an inverted pendulum on a cart example. 
    more » « less
  2. In this paper, we study the robustness property of policy optimization (particularly Gauss–Newton gradient descent algorithm which is equivalent to the policy iteration in reinforcement learning) subject to noise at each iteration. By invoking the concept of input-to-state stability and utilizing Lyapunov’s direct method, it is shown that, if the noise is sufficiently small, the policy iteration algorithm converges to a small neighborhood of the optimal solution even in the presence of noise at each iteration. Explicit expressions of the upperbound on the noise and the size of the neighborhood to which the policies ultimately converge are provided. Based on Willems’ fundamental lemma, a learning-based policy iteration algorithm is proposed. The persistent excitation condition can be readily guaranteed by checking the rank of the Hankel matrix related to an exploration signal. The robustness of the learning-based policy iteration to measurement noise and unknown system disturbances is theoretically demonstrated by the input-to-state stability of the policy iteration. Several numerical simulations are conducted to demonstrate the efficacy of the proposed method. 
    more » « less
  3. This paper addresses the problem of model-free reinforcement learning for Robust Markov Decision Process (RMDP) with large state spaces. The goal of the RMDP framework is to find a policy that is robust against the parameter uncertainties due to the mismatch between the simulator model and real-world settings. We first propose the Ro- bust Least Squares Policy Evaluation algorithm, which is a multi-step online model-free learning algorithm for policy evaluation. We prove the convergence of this algorithm using stochastic approximation techniques. We then propose Robust Least Squares Policy Iteration (RLSPI) algorithm for learning the optimal robust policy. We also give a general weighted Euclidean norm bound on the error (closeness to optimality) of the resulting policy. Finally, we demonstrate the performance of our RLSPI algorithm on some standard bench- mark problems. 
    more » « less
  4. Prior work on automatic control synthesis for cyberphysical systems under logical constraints has primarily focused on environmental disturbances or modeling uncertainties, however, the impact of deliberate and malicious attacks has been less studied. In this paper, we consider a discrete-time dynamical system with a linear temporal logic (LTL) constraint in the presence of an adversary, which is modeled as a stochastic game. We assume that the adversary observes the control policy before choosing an attack strategy. We investigate two problems. In the first problem, we synthesize a robust control policy for the stochastic game that maximizes the probability of satisfying the LTL constraint. A value iteration based algorithm is proposed to compute the optimal control policy. In the second problem, we focus on a subclass of LTL constraints, which consist of an arbitrary LTL formula and an invariant constraint. We then investigate the problem of computing a control policy that minimizes the expected number of invariant constraint violations while maximizing the probability of satisfying the arbitrary LTL constraint. We characterize the optimality condition for the desired control policy. A policy iteration based algorithm is proposed to compute the control policy. We illustrate the proposed approaches using two numerical case studies. 
    more » « less
  5. In this paper, we have proposed a resilient reinforcement learning method for discrete-time linear systems with unknown parameters, under denial-of-service (DoS) attacks. The proposed method is based on policy iteration that learns the optimal controller from input-state data amidst DoS attacks. We achieve an upper bound for the DoS duration to ensure closed-loop stability. The resilience of the closed-loop system, when subjected to DoS attacks with the learned controller and an internal model, has been thoroughly examined. The effectiveness of the proposed methodology is demonstrated on an inverted pendulum on a cart. 
    more » « less