skip to main content


Title: Robust Model-Free Learning and Control without Prior Knowledge
We present a simple model-free control algorithm that is able to robustly learn and stabilize an unknown discrete time linear system with full control and state feedback subject to arbitrary bounded disturbance and noise sequences. The controller does not require any prior knowledge of the system dynamics, disturbances or noise, yet can guarantee robust stability, uniform asymptotic bounds and uniform worst-case bounds on the state-deviation. Rather than the algorithm itself, we would like to highlight the new approach taken towards robust stability analysis which served as a key enabler in providing the presented stability and performance guarantees. We will conclude with simulation results that show that despite the generality and simplicity, the controller demonstrates good closed-loop performance.  more » « less
Award ID(s):
1735003
NSF-PAR ID:
10155682
Author(s) / Creator(s):
;
Date Published:
Journal Name:
2019 IEEE 58th Conference on Decision and Control (CDC)
Page Range / eLocation ID:
4577 to 4582
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In this paper, we develop a novel and safe control design approach that takes demonstrations provided by a human teacher to enable a robot to accomplish complex manipulation scenarios in dynamic environments. First, an overall task is divided into multiple simpler subtasks that are more appropriate for learning and control objectives. Then, by collecting human demonstrations, the subtasks that require robot movement are modeled by probabilistic movement primitives (ProMPs). We also study two strategies for modifying the ProMPs to avoid collisions with environmental obstacles. Finally, we introduce a rule-base control technique by utilizing a finite-state machine along with a unique means of control design for ProMPs. For the ProMP controller, we propose control barrier and Lyapunov functions to guide the system along a trajectory within the distribution defined by a ProMP while guaranteeing that the system state never leaves more than a desired distance from the distribution mean. This allows for better performance on nonlinear systems and offers solid stability and known bounds on the system state. A series of simulations and experimental studies demonstrate the efficacy of our approach and show that it can run in real time. Note to Practitioners —This paper is motivated by the need to create a teach-by-demonstration framework that captures the strengths of movement primitives and verifiable, safe control. We provide a framework that learns safe control laws from a probability distribution of robot trajectories through the use of advanced nonlinear control that incorporates safety constraints. Typically, such distributions are stochastic, making it difficult to offer any guarantees on safe operation. Our approach ensures that the distribution of allowed robot trajectories is within an envelope of safety and allows for robust operation of a robot. Furthermore, using our framework various probability distributions can be combined to represent complex scenarios in the environment. It will benefit practitioners by making it substantially easier to test and deploy accurate, efficient, and safe robots in complex real-world scenarios. The approach is currently limited to scenarios involving static obstacles, with dynamic obstacle avoidance an avenue of future effort. 
    more » « less
  2. N. Matni, M. Morari (Ed.)
    In this paper, we propose a robust reinforcement learning method for a class of linear discrete-time systems to handle model mismatches that may be induced by sim-to-real gap. Under the formulation of risk-sensitive linear quadratic Gaussian control, a dual-loop policy optimization algorithm is proposed to iteratively approximate the robust and optimal controller. The convergence and robustness of the dual-loop policy optimization algorithm are rigorously analyzed. It is shown that the dual-loop policy optimization algorithm uniformly converges to the optimal solution. In addition, by invoking the concept of small-disturbance input-to-state stability, it is guaranteed that the dual-loop policy optimization algorithm still converges to a neighborhood of the optimal solution when the algorithm is subject to a sufficiently small disturbance at each step. When the system matrices are unknown, a learning-based off-policy policy optimization algorithm is proposed for the same class of linear systems with additive Gaussian noise. The numerical simulation is implemented to demonstrate the efficacy of the proposed algorithm. 
    more » « less
  3. We will present a new general framework for robust and adaptive control that allows for distributed and scalable learning and control of large systems of interconnected linear subsystems. The control method is demonstrated for a linear time-invariant system with bounded parameter uncertainties, disturbances and noise. The presented scheme continuously collects measurements to reduce the uncertainty about the system parameters and adapts dynamic robust controllers online in a stable and performance-improving way. A key enabler for our approach is choosing a time-varying dynamic controller implementation, inspired by recent work on System Level Synthesis [1]. We leverage a new robustness result for this implementation to propose a general robust adaptive control algorithm. In particular, the algorithm allows us to impose communication and delay constraints on the controller implementation and is formulated as a sequence of robust optimization problems that can be solved in a distributed manner. The proposed control methodology performs particularly well when the interconnection between systems is sparse and the dynamics of local regions of subsystems depend only on a small number of parameters. As we will show on a five-dimensional exemplary chain-system, the algorithm can utilize system structure to efficiently learn and control the entire system while respecting communication and implementation constraints. Moreover, although current theoretical results require the assumption of small initial uncertainties to guarantee robustness, we will present simulations that show good closed-loop performance even in the case of large uncertainties, which suggests that this assumption is not critical for the presented technique and future work will focus on providing less conservative guarantees. 
    more » « less
  4. Frequency restoration in power systems is conventionally performed by broadcasting a centralized signal to local controllers. As a result of the energy transition, technological advances, and the scientific interest in distributed control and optimization methods, a plethora of distributed frequency control strategies have been proposed recently that rely on communication amongst local controllers. In this paper, we propose a fully decentralized leaky integral controller for frequency restoration that is derived from a classic lag element. We study steady-state, asymptotic optimality, nominal stability, input-to-state stability, noise rejection, transient performance, and robustness properties of this controller in closed loop with a nonlinear and multivariable power system model. We demonstrate that the leaky integral controller can strike an acceptable trade-off between performance and robustness as well as between asymptotic disturbance rejection and transient convergence rate by tuning its DC gain and time constant. We compare our findings to conventional decentralized integral control and distributed- averaging-based integral control in theory and simulations. 
    more » « less
  5. Control systems are increasingly targeted by malicious adversaries, who may inject spurious sensor measurements in order to bias the controller behavior and cause suboptimal performance or safety violations. This paper investigates the problem of tracking a reference trajectory while satisfying safety and reachability constraints in the presence of such false data injection attacks. We consider a linear, time-invariant system with additive Gaussian noise in which a subset of sensors can be compromised by an attacker, while the remaining sensors are regarded as secure. We propose a control policy in which two estimates of the system state are maintained, one based on all sensors and one based on only the secure sensors. The optimal control action based on the secure sensors alone is then computed at each time step, and the chosen control action is constrained to lie within a given distance of this value. We show that this policy can be implemented by solving a quadraticallyconstrained quadratic program at each time step. We develop a barrier function approach to choosing the parameters of our scheme in order to provide provable guarantees on safety and reachability, and derive bounds on the probability that our control policies deviate from the optimal policy when no attacker is present. Our framework is validated through numerical study. 
    more » « less