skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Information Theoretic Regret Bounds for Online Nonlinear Control
This work studies the problem of sequential control in an unknown, nonlinear dynamical system, where we model the underlying system dynamics as an unknown function in a known Reproducing Kernel Hilbert Space. This framework yields a general setting that permits discrete and continuous control inputs as well as non-smooth, non-differentiable dynamics. Our main result, the Lower Confidence-based Continuous Control (LC3) algorithm, enjoys a near-optimal "root T" regret bound against the optimal controller in episodic settings, where T is the number of episodes. The bound has no explicit dependence on dimension of the system dynamics, which could be infinite, but instead only depends on information theoretic quantities. We empirically show its application to a number of nonlinear control tasks and demonstrate the benefit of exploration for learning model dynamics.  more » « less
Award ID(s):
1703574
PAR ID:
10276110
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Advances in neural information processing systems
Issue:
33
ISSN:
1049-5258
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper proposes a data-driven optimal tracking control scheme for unknown general nonlinear systems using neural networks. First, a new neural networks structure is established to reconstruct the unknown system dynamics of the form ˙ x(t) = f (x(t))+g(x(t))u(t). Two networks in parallel are designed to approximate the functions f (x) and g(x). Then the obtained data-driven models are used to build the optimal tracking control. The developed control consists of two parts, the feed-forward control and the optimal feedback control. The optimal feedback control is developed by approximating the solution of the Hamilton-Jacobi-Bellman equation with neural networks. Unlike other studies, the Hamilton-Jacobi-Bellman solution is found by estimating the value function derivative using neural networks. Finally, the proposed control scheme is tested on a delta robot. Two trajectory tracking examples are provided to verify the effectiveness of the proposed optimal control approach. 
    more » « less
  2. Abstract Objective. Precise control of neural systems is essential to experimental investigations of how the brain controls behavior and holds the potential for therapeutic manipulations to correct aberrant network states. Model predictive control, which employs a dynamical model of the system to find optimal control inputs, has promise for dealing with the nonlinear dynamics, high levels of exogenous noise, and limited information about unmeasured states and parameters that are common in a wide range of neural systems. However, the challenge still remains of selecting the right model, constraining its parameters, and synchronizing to the neural system.Approach. As a proof of principle, we used recent advances in data-driven forecasting to construct a nonlinear machine-learning model of a Hodgkin–Huxley type neuron when only the membrane voltage is observable and there are an unknown number of intrinsic currents.Main Results. We show that this approach is able to learn the dynamics of different neuron types and can be used with model predictive control (MPC) to force the neuron to engage in arbitrary, researcher-defined spiking behaviors.Significance.To the best of our knowledge, this is the first application of nonlinear MPC of a conductance-based model where there is only realistically limited information about unobservable states and parameters. 
    more » « less
  3. null (Ed.)
    This paper presents a generalizable methodology for data-driven identification of nonlinear dynamics that bounds the model error in terms of the prediction horizon and the magnitude of the derivatives of the system states. Using higher order derivatives of general nonlinear dynamics that need not be known, we construct a Koopman operator-based linear representation and utilize Taylor series accuracy analysis to derive an error bound. The resulting error formula is used to choose the order of derivatives in the basis functions and obtain a data-driven Koopman model using a closed-form expression that can be computed in real time. Using the inverted pendulum system, we illustrate the robustness of the error bounds given noisy measurements of unknown dynamics, where the derivatives are estimated numerically. When combined with control, the Koopman representation of the nonlinear system has marginally better performance than competing nonlinear modeling methods, such as SINDy and NARX. In addition, as a linear model, the Koopman approach lends itself readily to efficient control design tools, such as LQR, whereas the other modeling approaches require nonlinear control methods. The efficacy of the approach is further demonstrated with simulation and experimental results on the control of a tail-actuated robotic fish. Experimental results show that the proposed data-driven control approach outperforms a tuned PID (Proportional Integral Derivative) controller and that updating the data-driven model online significantly improves performance in the presence of unmodeled fluid disturbance. This paper is complemented with a video: https://youtu.be/9 wx0tdDta0. 
    more » « less
  4. We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon T with fixed and known cost matrices Q,R, but unknown and non-stationary dynamics A_t, B_t. The sequence of dynamics matrices can be arbitrary, but with a total variation, V_T, assumed to be o(T) and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially sub-optimal controllers is available for all t, we present an algorithm that achieves the optimal dynamic regret of O(V_T^2/5 T^3/5 ). With piecewise constant dynamics, our algorithm achieves the optimal regret of O(sqrtST ) where S is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems. We also argue that non-adaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of $$V_T$$. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is non-stationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application. 
    more » « less
  5. A hybrid filtered basis function (FBF) approach is proposed in this paper for feedforward tracking control of linear systems with unmodeled nonlinear dynamics. Unlike most available tracking control techniques, the FBF approach is very versatile; it is applicable to any type of linear system, regardless of its underlying dynamics. The FBF approach expresses the control input to a system as a linear combination of basis functions with unknown coefficients. The basis functions are forward filtered through a linear model of the system's dynamics and the unknown coefficients are selected such that tracking error is minimized. The linear models used in existing implementations of the FBF approach are typically physics-based representations of the linear dynamics of a system. The proposed hybrid FBF approach expands the application of the FBF approach to systems with unmodeled nonlinearities by learning from data. A hybrid model is formulated by combining a physics-based model of the system's linear dynamics with a data-driven linear model that approximates the unmodeled nonlinear dynamics. The hybrid model is used online in receding horizon to compute optimal control commands that minimize tracking errors. The proposed hybrid FBF approach is shown in simulations on a model of a vibration-prone 3D printer to improve tracking accuracy by up to 65.4%, compared to an existing FBF approach that does not incorporate data. 
    more » « less