skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning Optimal Controllers by Policy Gradient: Global Optimality via Convex Parameterization
Common reinforcement learning methods seek optimal controllers for unknown dynamical systems by searching in the "policy" space directly. A recent line of research, starting with [1], aims to provide theoretical guarantees for such direct policy-update methods by exploring their performance in classical control settings, such as the infinite horizon linear quadratic regulator (LQR) problem. A key property these analyses rely on is that the LQR cost function satisfies the "gradient dominance" property with respect to the policy parameters. Gradient dominance helps guarantee that the optimal controller can be found by running gradient-based algorithms on the LQR cost. The gradient dominance property has so far been verified on a case-by-case basis for several control problems including continuous/discrete time LQR, LQR with decentralized controller, H2/H∞ robust control.In this paper, we make a connection between this line of work and classical convex parameterizations based on linear matrix inequalities (LMIs). Using this, we propose a unified framework for showing that gradient dominance indeed holds for a broad class of control problems, such as continuous- and discrete-time LQR, minimizing the L2 gain, and problems using system-level parameterization. Our unified framework provides insights into the landscape of the cost function as a function of the policy, and enables extending convergence results for policy gradient descent to a much larger class of problems.  more » « less
Award ID(s):
2023166
PAR ID:
10349167
Author(s) / Creator(s):
;
Date Published:
Journal Name:
60th IEEE Conference on Decision and Control (CDC),
Page Range / eLocation ID:
4576 to 4581
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abate, A; Cannon, M; Margellos, K; Papachristodoulou, A (Ed.)
    We investigate the problem of learning linear quadratic regulators (LQR) in a multi-task, heterogeneous, and model-free setting. We characterize the stability and personalization guarantees of a policy gradient-based (PG) model-agnostic meta-learning (MAML) (Finn et al., 2017) approach for the LQR problem under different task-heterogeneity settings. We show that our MAML-LQR algorithm produces a stabilizing controller close to each task-specific optimal controller up to a task-heterogeneity bias in both model-based and model-free learning scenarios. Moreover, in the model-based setting, we show that such a controller is achieved with a linear convergence rate, which improves upon sub-linear rates from existing work. Our theoretical guarantees demonstrate that the learned controller can efficiently adapt to unseen LQR tasks. 
    more » « less
  2. This paper proposes a novel robust reinforcement learning framework for discrete-time linear systems with model mismatch that may arise from the sim-to-real gap. A key strategy is to invoke advanced techniques from control theory. Using the formulation of the classical risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to generate a robust optimal controller. The dual-loop policy optimization algorithm is shown to be globally and uniformly convergent, and robust against disturbances during the learning process. This robustness property is called small-disturbance input-to-state stability and guarantees that the proposed policy optimization algorithm converges to a small neighborhood of the optimal controller as long as the disturbance at each learning step is relatively small. In addition, when the system dynamics is unknown, a novel model-free off-policy policy optimization algorithm is proposed. Finally, numerical examples are provided to illustrate the proposed algorithm. 
    more » « less
  3. It is well-known that linear quadratic regulators (LQR) enjoy guaranteed stability margins, whereas linear quadratic Gaussian regulators (LQG) do not. In this letter, we consider systems and compensators defined over directed acyclic graphs. In particular, there are multiple decision-makers, each with access to a different part of the global state. In this setting, the optimal LQR compensator is dynamic, similar to classical LQG. We show that when sub-controller input costs are decoupled (but there is possible coupling between sub-controller state costs), the decentralized LQR compensator enjoys similar guaranteed stability margins to classical LQR. However, these guarantees disappear when cost coupling is introduced. 
    more » « less
  4. Quadratic performance indices associated with linear plants offer simplicity and lead to linear feedback control laws, but they may not adequately capture the complexity and flexibility required to address various practical control problems. One notable example is to improve, by using possibly nonlinear laws, on the trade-off between rise time and overshoot commonly observed in classical regulator problems with linear feedback control laws. To address these issues, non-quadratic terms can be introduced into the performance index, resulting in nonlinear control laws. In this study, we tackle the challenge of solving optimal control problems with non-quadratic performance indices using the closed-loop neighboring extremal optimal control (NEOC) approach and homotopy method. Building upon the foundation of the Linear Quadratic Regulator (LQR) framework, we introduce a parameter associated with the non-quadratic terms in the cost function, which is continuously adjusted from 0 to 1. We propose an iterative algorithm based on a closed-loop NEOC framework to handle each gradual adjustment. Additionally, we discuss and analyze the classical work of Bass and Webber, whose approach involves including additional non-quadratic terms in the performance index to render the resulting Hamilton-Jacobi equation analytically solvable. Our findings are supported by numerical examples. 
    more » « less
  5. Duality of control and estimation allows mapping recent advances in data-guided control to the estimation setup. This paper formalizes and utilizes such a mapping to consider learning the optimal (steady-state) Kalman gain when process and measurement noise statistics are unknown. Specifically, building on the duality between synthesizing optimal control and estimation gains, the filter design problem is formalized as direct policy learning. In this direction, the duality is used to extend existing theoretical guarantees of direct policy updates for Linear Quadratic Regulator (LQR) to establish global convergence of the Gradient Descent (GD) algorithm for the estimation problem–while addressing subtle differences between the two synthesis problems. Subsequently, a Stochastic Gradient Descent (SGD) approach is adopted to learn the optimal Kalman gain without the knowledge of noise covariances. The results are illustrated via several numerical examples. 
    more » « less