skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: ROBUST MODEL BASED REINFORCEMENT LEARNING USING L1 ADAPTIVE CONTROL
We introduce L1-MBRL, a control-theoretic augmentation scheme for Model-Based Reinforcement Learning (MBRL) algorithms. Unlike model-free approaches, MBRL algorithms learn a model of the transition function using data and use it to design a control input. Our approach generates a series of approximate control-affine models of the learned transition function according to the proposed switching law. Using the approximate model, control input produced by the underlying MBRL is perturbed by the L1 adaptive control, which is designed to enhance the robustness of the system against uncertainties. Importantly, this approach is agnostic to the choice of MBRL algorithm, enabling the use of the scheme with various MBRL algorithms. MBRL algorithms with L1 augmentation exhibit enhanced performance and sample efficiency across multiple MuJoCo environments, outperforming the original MBRL algorithms, both with and without system noise.  more » « less
Award ID(s):
2133656
PAR ID:
10631543
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
International Conference on learning representations, openreview.net
Date Published:
ISSN:
arXiv:2403.14860
Page Range / eLocation ID:
1-22
Format(s):
Medium: X
Location:
https://openreview.net/forum?id=GaLCLvJaoF
Sponsoring Org:
National Science Foundation
More Like this
  1. A periodic model predictive control (MPC) scheme is proposed for tracking halo orbits. The problem is formulated and solved in the elliptic restricted three-body problem (ER3BP) setting. The reference trajectory to be tracked is designed by using eccentricity continuation techniques. The MPC design exploits the periodicity of the tracking model and guarantees exponential stability of the linearized closed-loop system, through a suitable choice of the terminal set and weight matrices. A sum-of-norms cost function is adopted to promote fuel saving. The proposed control scheme is validated on two simulated missions in the Earth–Moon system, which, respectively, involve station keeping on a halo orbit near the L1 Lagrange point and rendezvous to a halo orbit near the L2 Lagrange point. Results illustrate the advantage of designing the reference trajectory and the periodic control directly in the ER3BP setting versus approximate solutions based on the circular restricted three-body problem (CR3BP). 
    more » « less
  2. This paper is focused on the output tracking control problem of a wave equation with both matched and unmatched boundary uncertainties. An adaptive boundary feedback control scheme is proposed by utilizing radial basis function neural networks (RBF NNs) to deal with the effect of system uncertainties. Specifically, two RBF NN models are first developed to approximate the matched and unmatched system uncertain dynamics respectively. Based on this, an adaptive NN control scheme is derived, which consists of: (i) an adaptive boundary feedback controller embedded by the NN model approximating the matched uncertainty, for rendering stable and accurate tracking control; and (ii) a reference model embedded by the NN model approximating the unmatched uncertainty, for generating a prescribed reference trajectory. Rigorous analysis is performed using the Lyapunov theory and the C0-semigroup theory to prove that our proposed control scheme can guarantee closed-loop stability and wellposedness. Simulation study has been conducted to demonstrate effectiveness of the proposed approach. 
    more » « less
  3. This paper proposes a data-driven optimal tracking control scheme for unknown general nonlinear systems using neural networks. First, a new neural networks structure is established to reconstruct the unknown system dynamics of the form ˙ x(t) = f (x(t))+g(x(t))u(t). Two networks in parallel are designed to approximate the functions f (x) and g(x). Then the obtained data-driven models are used to build the optimal tracking control. The developed control consists of two parts, the feed-forward control and the optimal feedback control. The optimal feedback control is developed by approximating the solution of the Hamilton-Jacobi-Bellman equation with neural networks. Unlike other studies, the Hamilton-Jacobi-Bellman solution is found by estimating the value function derivative using neural networks. Finally, the proposed control scheme is tested on a delta robot. Two trajectory tracking examples are provided to verify the effectiveness of the proposed optimal control approach. 
    more » « less
  4. In this paper, we propose a novel control architecture, inspired from neuroscience, for adaptive control of continuous time systems. The objective here is to design control architectures and algorithms that can learn and adapt quickly to changes that are even abrupt. The proposed architecture, in the setting of standard neural network (NN) based adaptive control, augments an external working memory to the NN. The learning system stores, in its external working memory, recently observed feature vectors from the hidden layer of the NN that are relevant and forgets the older irrelevant values. It retrieves relevant vectors from the working memory to modify the final control signal generated by the controller. The use of external working memory improves the context inducing the learning system to search in a particular direction. This directed learning allows the learning system to find a good approximation of the unknown function even after abrupt changes quickly. We consider two classes of controllers for illustration of our ideas (i) a model reference NN adaptive controller for linear systems with matched uncertainty (ii) backstepping NN controller for strict feedback systems. Through extensive simulations and specific metrics we show that memory augmentation improves learning significantly even when the system undergoes sudden changes. Importantly, we also provide evidence for the proposed mechanism by which this specific memory augmentation improves learning. 
    more » « less
  5. The predictive monitoring problem asks whether a deployed system is likely to fail over the next T seconds under some environmental conditions. This problem is of the utmost importance for cyber-physical systems, and has inspired real-time architectures capable of adapting to such failures upon forewarning. In this paper, we present a linear model-predictive scheme for the real-time monitoring of linear systems governed by time-triggered controllers and time-varying disturbances. The scheme uses a combination of offline (advance) and online computations to decide if a given plant model has entered a state from which no matter what control is applied, the disturbance has a strategy to drive the system to an unsafe region. Our approach is independent of the control strategy used: this allows us to deal with plants that are controlled using model-predictive control techniques or even opaque machine-learning based control algorithms that are hard to reason with using existing reachable set estimation algorithms. Our online computation reuses the symbolic reachable sets computed offline. The real-time monitor instantiates the reachable set with a concrete state estimate, and repeatedly performs emptiness checks with respect to a safety property. We classify the various alarms raised by our approach in terms of what they imply about the system as a whole. We implement our real-time monitoring approach over numerous linear system benchmarks and show that the computation can be performed rapidly in practice. Furthermore, we also examine the alarms reported by our approach and show how some of the alarms can be used to improve the controller. 
    more » « less