This content will become publicly available on January 1, 2025
- PAR ID:
- 10511911
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- IEEE Transactions on Automatic Control
- ISSN:
- 0018-9286
- Page Range / eLocation ID:
- 1 to 16
- Subject(s) / Keyword(s):
- Robust reinforcement learning policy optimization risk-sensitive LQG
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Matni, N ; Morari, M ; Pappas, G J (Ed.)In this paper, we propose a robust reinforcement learning method for a class of linear discrete-time systems to handle model mismatches that may be induced by sim-to-real gap. Under the formulation of risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to iteratively approximate the robust and optimal controller. The convergence and robustness of the dual-loop policy optimization algorithm are rigorously analyzed. It is shown that the dual-loop policy optimization algorithm uniformly converges to the optimal solution. In addition, by invoking the concept of small-disturbance input-to-state stability, it is guaranteed that the dual-loop policy optimization algorithm still converges to a neighborhood of the optimal solution when the algorithm is subject to a sufficiently small disturbance at each step. When the system matrices are unknown, a learning-based off-policy policy optimization algorithm is proposed for the same class of linear systems with additive Gaussian noise. The numerical simulation is implemented to demonstrate the efficacy of the proposed algorithm.more » « less
-
N. Matni, M. Morari (Ed.)In this paper, we propose a robust reinforcement learning method for a class of linear discrete-time systems to handle model mismatches that may be induced by sim-to-real gap. Under the formulation of risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to iteratively approximate the robust and optimal controller. The convergence and robustness of the dual-loop policy optimization algorithm are rigorously analyzed. It is shown that the dual-loop policy optimization algorithm uniformly converges to the optimal solution. In addition, by invoking the concept of small-disturbance input-to-state stability, it is guaranteed that the dual-loop policy optimization algorithm still converges to a neighborhood of the optimal solution when the algorithm is subject to a sufficiently small disturbance at each step. When the system matrices are unknown, a learning-based off-policy policy optimization algorithm is proposed for the same class of linear systems with additive Gaussian noise. The numerical simulation is implemented to demonstrate the efficacy of the proposed algorithm.more » « less
-
This paper addresses the problem of learning the optimal control policy for a nonlinear stochastic dynam- ical. This problem is subject to the ‘curse of dimension- ality’ associated with the dynamic programming method. This paper proposes a novel decoupled data-based con- trol (D2C) algorithm that addresses this problem using a decoupled, ‘open-loop - closed-loop’, approach. First, an open-loop deterministic trajectory optimization problem is solved using a black-box simulation model of the dynamical system. Then, closed-loop control is developed around this open-loop trajectory by linearization of the dynamics about this nominal trajectory. By virtue of linearization, a linear quadratic regulator based algorithm can be used for this closed-loop control. We show that the performance of D2C algorithm is approximately optimal. Moreover, simulation performance suggests a significant reduction in training time compared to other state of the art algorithms.more » « less
-
Alessandro Astolfi (Ed.)This article studies the adaptive optimal stationary control of continuous-time linear stochastic systems with both additive and multiplicative noises, using reinforcement learning techniques. Based on policy iteration, a novel off-policy reinforcement learning algorithm, named optimistic least-squares-based policy iteration, is proposed, which is able to find iteratively near-optimal policies of the adaptive optimal stationary control problem directly from input/state data without explicitly identifying any system matrices, starting from an initial admissible control policy. The solutions given by the proposed optimistic least-squares-based policy iteration are proved to converge to a small neighborhood of the optimal solution with probability one, under mild conditions. The application of the proposed algorithm to a triple inverted pendulum example validates its feasibility and effectiveness.more » « less
-
Summary This paper proposes an intermittent model‐free learning algorithm for linear time‐invariant systems, where the control policy and transmission decisions are co‐designed simultaneously while also being subjected to worst‐case disturbances. The control policy is designed by introducing an internal dynamical system to further reduce the transmission rate and provide bandwidth flexibility in cyber‐physical systems. Moreover, a
Q ‐learning algorithm with two actors and a single critic structure is developed to learn the optimal parameters of aQ ‐function. It is shown by using an impulsive system approach that the closed‐loop system has an asymptotically stable equilibrium and that no Zeno behavior occurs. Furthermore, a qualitative performance analysis of the model‐free dynamic intermittent framework is given and shows the degree of suboptimality concerning the optimal continuous updated controller. Finally, a numerical simulation of an unknown system is carried out to highlight the efficacy of the proposed framework.