skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on January 1, 2025

Title: Robust Reinforcement Learning for Risk-Sensitive Linear Quadratic Gaussian Control
This paper proposes a novel robust reinforcement learning framework for discrete-time linear systems with model mismatch that may arise from the sim-to-real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to generate a robust optimal controller. The dual-loop policy optimization algorithm is shown to be globally and uniformly convergent, and robust against disturbances during the learning process. This robustness property is called small-disturbance input-to-state stability and guarantees that the proposed policy optimization algorithm converges to a small neighborhood of the optimal controller as long as the disturbance at each learning step is relatively small. In addition, when the system dynamics is unknown, a novel model-free off-policy policy optimization algorithm is proposed. Finally, numerical examples are provided to illustrate the proposed algorithm.  more » « less
Award ID(s):
2210320 2148309
PAR ID:
10511911
Author(s) / Creator(s):
; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE Transactions on Automatic Control
ISSN:
0018-9286
Page Range / eLocation ID:
1 to 16
Subject(s) / Keyword(s):
Robust reinforcement learning policy optimization risk-sensitive LQG
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Matni, N ; Morari, M ; Pappas, G J (Ed.)
    In this paper, we propose a robust reinforcement learning method for a class of linear discrete-time systems to handle model mismatches that may be induced by sim-to-real gap. Under the formulation of risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to iteratively approximate the robust and optimal controller. The convergence and robustness of the dual-loop policy optimization algorithm are rigorously analyzed. It is shown that the dual-loop policy optimization algorithm uniformly converges to the optimal solution. In addition, by invoking the concept of small-disturbance input-to-state stability, it is guaranteed that the dual-loop policy optimization algorithm still converges to a neighborhood of the optimal solution when the algorithm is subject to a sufficiently small disturbance at each step. When the system matrices are unknown, a learning-based off-policy policy optimization algorithm is proposed for the same class of linear systems with additive Gaussian noise. The numerical simulation is implemented to demonstrate the efficacy of the proposed algorithm. 
    more » « less
  2. N. Matni, M. Morari (Ed.)
    In this paper, we propose a robust reinforcement learning method for a class of linear discrete-time systems to handle model mismatches that may be induced by sim-to-real gap. Under the formulation of risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to iteratively approximate the robust and optimal controller. The convergence and robustness of the dual-loop policy optimization algorithm are rigorously analyzed. It is shown that the dual-loop policy optimization algorithm uniformly converges to the optimal solution. In addition, by invoking the concept of small-disturbance input-to-state stability, it is guaranteed that the dual-loop policy optimization algorithm still converges to a neighborhood of the optimal solution when the algorithm is subject to a sufficiently small disturbance at each step. When the system matrices are unknown, a learning-based off-policy policy optimization algorithm is proposed for the same class of linear systems with additive Gaussian noise. The numerical simulation is implemented to demonstrate the efficacy of the proposed algorithm. 
    more » « less
  3. This paper addresses the problem of learning the optimal control policy for a nonlinear stochastic dynam- ical. This problem is subject to the ‘curse of dimension- ality’ associated with the dynamic programming method. This paper proposes a novel decoupled data-based con- trol (D2C) algorithm that addresses this problem using a decoupled, ‘open-loop - closed-loop’, approach. First, an open-loop deterministic trajectory optimization problem is solved using a black-box simulation model of the dynamical system. Then, closed-loop control is developed around this open-loop trajectory by linearization of the dynamics about this nominal trajectory. By virtue of linearization, a linear quadratic regulator based algorithm can be used for this closed-loop control. We show that the performance of D2C algorithm is approximately optimal. Moreover, simulation performance suggests a significant reduction in training time compared to other state of the art algorithms. 
    more » « less
  4. Alessandro Astolfi (Ed.)
    This article studies the adaptive optimal stationary control of continuous-time linear stochastic systems with both additive and multiplicative noises, using reinforcement learning techniques. Based on policy iteration, a novel off-policy reinforcement learning algorithm, named optimistic least-squares-based policy iteration, is proposed, which is able to find iteratively near-optimal policies of the adaptive optimal stationary control problem directly from input/state data without explicitly identifying any system matrices, starting from an initial admissible control policy. The solutions given by the proposed optimistic least-squares-based policy iteration are proved to converge to a small neighborhood of the optimal solution with probability one, under mild conditions. The application of the proposed algorithm to a triple inverted pendulum example validates its feasibility and effectiveness. 
    more » « less
  5. Summary

    This paper proposes an intermittent model‐free learning algorithm for linear time‐invariant systems, where the control policy and transmission decisions are co‐designed simultaneously while also being subjected to worst‐case disturbances. The control policy is designed by introducing an internal dynamical system to further reduce the transmission rate and provide bandwidth flexibility in cyber‐physical systems. Moreover, aQ‐learning algorithm with two actors and a single critic structure is developed to learn the optimal parameters of aQ‐function. It is shown by using an impulsive system approach that the closed‐loop system has an asymptotically stable equilibrium and that no Zeno behavior occurs. Furthermore, a qualitative performance analysis of the model‐free dynamic intermittent framework is given and shows the degree of suboptimality concerning the optimal continuous updated controller. Finally, a numerical simulation of an unknown system is carried out to highlight the efficacy of the proposed framework.

     
    more » « less