skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 10:00 PM ET on Friday, December 8 until 2:00 AM ET on Saturday, December 9 due to maintenance. We apologize for the inconvenience.

Title: Learning to Control an Unstable System with One Minute of Data: Leveraging Gaussian Process Differentiation in Predictive Control
We present a straightforward and efficient way to control unstable robotic systems using an estimated dynamics model. Specifically, we show how to exploit the differentiability of Gaussian Processes to create a state-dependent linearized approximation of the true continuous dynamics that can be integrated with model predictive control. Our approach is compatible with most Gaussian process approaches for system identification, and can learn an accurate model using modest amounts of training data. We validate our approach by learning the dynamics of an unstable system such as a segway with a 7-D state space and 2-D input space (using only one minute of data), and we show that the resulting controller is robust to unmodelled dynamics and disturbances, while state-of-the-art control methods based on nominal models can fail under small perturbations. Code is open sourced at  more » « less
Award ID(s):
1645832 1932091 1924526 1923239
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the IEEERSJ International Conference on Intelligent Robots and Systems
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Summary

    Scenario‐based model predictive control (MPC) methods can mitigate the conservativeness inherent to open‐loop robust MPC. Yet, the scenarios are often generated offline based on worst‐case uncertainty descriptions obtaineda priori, which can in turn limit the improvements in the robust control performance. To this end, this paper presents a learning‐based, adaptive‐scenario‐tree model predictive control approach for uncertain nonlinear systems with time‐varying and/or hard‐to‐model dynamics. Bayesian neural networks (BNNs) are used to learn a state‐ and input‐dependent description of model uncertainty, namely the mismatch between a nominal (physics‐based or data‐driven) model of a system and its actual dynamics. We first present a new approach for training robust BNNs (RBNNs) using probabilistic Lipschitz bounds to provide a less conservative uncertainty quantification. Then, we present an approach to evaluate the credible intervals of RBNN predictions and determine the number of samples required for estimating the credible intervals given a credible level. The performance of RBNNs is evaluated with respect to that of standard BNNs and Gaussian process (GP) as a basis of comparison. The RBNN description of plant‐model mismatch with verified accurate credible intervals is employed to generate adaptive scenarios online for scenario‐based MPC (sMPC). The proposed sMPC approach with adaptive scenario tree can improve the robust control performance with respect to sMPC with a fixed, worst‐case scenario tree and with respect to an adaptive‐scenario‐based MPC (asMPC) using GP regression on a cold atmospheric plasma system. Furthermore, closed‐loop simulation results illustrate that robust model uncertainty learning via RBNNs can enhance the probability of constraint satisfaction of asMPC.

    more » « less
  2. We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a direct latent model learning approach, where a dynamic model in some latent state space is learned by predicting quantities directly related to planning (e.g., costs) without reconstructing the observations. In particular, we focus on an intuitive cost-driven state representation learning method for solving Linear Quadratic Gaussian (LQG) control, one of the most fundamental partially observable control problems. As our main results, we establish finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model. To the best of our knowledge, despite various empirical successes, prior to this work it was unclear if such a cost-driven latent model learner enjoys finite-sample guarantees. Our work underscores the value of predicting multi-step costs, an idea that is key to our theory, and notably also an idea that is known to be empirically valuable for learning state representations. 
    more » « less
  3. null (Ed.)
    Safety is a critical component in today's autonomous and robotic systems. Many modern controllers endowed with notions of guaranteed safety properties rely on accurate mathematical models of these nonlinear dynamical systems. However, model uncertainty is always a persistent challenge weakening theoretical guarantees and compromising safety. For safety-critical systems, this is an even bigger challenge. Typically, safety is ensured by constraining the system states within a safe constraint set defined a priori by relying on the model of the system. A popular approach is to use Control Barrier Functions (CBFs) that encode safety using a smooth function. However, CBFs fail in the presence of model uncertainties. Moreover, an inaccurate model can either lead to incorrect notions of safety or worse, incur system critical failures. Addressing these drawbacks, we present a novel safety formulation that leverages properties of CBFs and positive definite kernels to design Gaussian CBFs. The underlying kernels are updated online by learning the unmodeled dynamics using Gaussian Processes (GPs). While CBFs guarantee forward invariance, the hyperparameters estimated using GPs update the kernel online and thereby adjust the relative notion of safety. We demonstrate our proposed technique on a safety-critical quadrotor on SO(3) in the presence of model uncertainty in simulation. With the kernel update performed online, safety is preserved for the system. 
    more » « less
  4. Legged robots with point or small feet are nearly impossible to control instantaneously but are controllable over the time scale of one or more steps, also known as step-to-step control. Previous approaches achieve step-to-step control using optimization by (1) using the exact model obtained by integrating the equations of motion, or (2) using a linear approximation of the step-to-step dynamics. The former provides a large region of stability at the expense of a high computational cost while the latter is computationally cheap but offers limited region of stability. Our method combines the advantages of both. First, we generate input/output data by simulating a single step. Second, the input/output data is curve fitted using a regression model to get a closed-form approximation of the step-to-step dynamics. We do this model identification offline. Next, we use the regression model for online optimal control. Here, using the spring-load inverted pendulum model of hopping, we show that both parametric (polynomial and neural network) and non-parametric (gaussian process regression) approximations can adequately model the step-to-step dynamics. We then show this approach can stabilize a wide range of initial conditions fast enough to enable real-time control. Our results suggest that closed-form approximation of the step-to-step dynamics provides a simple accurate model for fast optimal control of legged robots. 
    more » « less
  5. We propose a Bayesian decision making framework for control of Markov Decision Processes (MDPs) with unknown dynamics and large, possibly continuous, state, action, and parameter spaces in data-poor environments. Most of the existing adaptive controllers for MDPs with unknown dynamics are based on the reinforcement learning framework and rely on large data sets acquired by sustained direct interaction with the system or via a simulator. This is not feasible in many applications, due to ethical, economic, and physical constraints. The proposed framework addresses the data poverty issue by decomposing the problem into an offline planning stage that does not rely on sustained direct interaction with the system or simulator and an online execution stage. In the offline process, parallel Gaussian process temporal difference (GPTD) learning techniques are employed for near-optimal Bayesian approximation of the expected discounted reward over a sample drawn from the prior distribution of unknown parameters. In the online stage, the action with the maximum expected return with respect to the posterior distribution of the parameters is selected. This is achieved by an approximation of the posterior distribution using a Markov Chain Monte Carlo (MCMC) algorithm, followed by constructing multiple Gaussian processes over the parameter space for efficient prediction of the means of the expected return at the MCMC sample. The effectiveness of the proposed framework is demonstrated using a simple dynamical system model with continuous state and action spaces, as well as a more complex model for a metastatic melanoma gene regulatory network observed through noisy synthetic gene expression data. 
    more » « less