skip to main content

Title: Reinforcement Learning for Safety-Critical Control under Model Uncertainty, using Control Lyapunov Functions and Control Barrier Functions
In this paper, the issue of model uncertainty in safety-critical control is addressed with a data-driven approach. For this purpose, we utilize the structure of an input-output linearization controller based on a nominal model along with a Control Barrier Function and Control Lyapunov Function based Quadratic Program (CBF-CLF-QP). Specifically, we propose a novel reinforcement learning framework which learns the model uncertainty present in the CBF and CLF constraints, as well as other control-affine dynamic constraints in the quadratic program. The trained policy is combined with the nominal model based CBF-CLF-QP, resulting in the Reinforcement Learning based CBF-CLF-QP (RL-CBF-CLF-QP), which addresses the problem of model uncertainty in the safety constraints. The performance of the proposed method is validated by testing it on an underactuated nonlinear bipedal robot walking on randomly spaced stepping stones with one step preview, obtaining stable and safe walking under model uncertainty.
; ; ;
Award ID(s):
Publication Date:
Journal Name:
Robotics: Science and Systems (RSS)
Sponsoring Org:
National Science Foundation
More Like this
  1. With the increasing need for safe control in the domain of autonomous driving, model-based safety-critical control approaches are widely used, especially Control Barrier Function (CBF) based approaches. Among them, Exponential CBF (eCBF) is particularly popular due to its realistic applicability to high-relative-degree systems. However, for most of the optimization-based controllers utilizing CBF-based constraints, solution feasibility is a common issue raised from potential conflict among different constraints. Moreover, how to incorporate uncertainty into the eCBF-based constraints in high-relative-degree systems to account for safety remains an open challenge. In this paper, we present a novel approach to extend a eCBF-based safe criticalmore »controller to a probabilistic setting to handle potential motion uncertainty from system dynamics. More importantly, we leverage an optimization-based technique to provide a solution feasibility guarantee in run time, while ensuring probabilistic safety. Lane changing and intersection handling are demonstrated as two use cases, and experiment results are provided to show the effectiveness of the proposed approach.« less
  2. Modern nonlinear control theory seeks to develop feedback controllers that endow systems with properties such as safety and stability. The guarantees ensured by these controllers often rely on accurate estimates of the system state for determining control actions. In practice, measurement model uncertainty can lead to error in state estimates that degrades these guarantees. In this paper, we seek to unify techniques from control theory and machine learning to synthesize controllers that achieve safety in the presence of measurement model uncertainty. We define the notion of a Measurement-Robust Control Barrier Function (MR-CBF) as a tool for determining safe control inputsmore »when facing measurement model uncertainty. Furthermore, MR-CBFs are used to inform sampling methodologies for learning-based perception systems and quantify tolerable error in the resulting learned models. We demonstrate the efficacy of MR-CBFs in achieving safety with measurement model uncertainty on a simulated Segway system.« less
  3. Real-time controllers must satisfy strict safety requirements. Recently, Control Barrier Functions (CBFs) have been proposed that guarantee safety by ensuring that a suitablydefined barrier function remains bounded for all time. The CBF method, however, has only been developed for deterministic systems and systems with worst-case disturbances and uncertainties. In this paper, we develop a CBF framework for safety of stochastic systems. We consider complete information systems, in which the controller has access to the exact system state, as well as incomplete information systems where the state must be reconstructed from noisy measurements. In the complete information case, we formulate amore »notion of barrier functions that leads to sufficient conditions for safety with probability 1. In the incomplete information case, we formulate barrier functions that take an estimate from an extended Kalman filter as input, and derive bounds on the probability of safety as a function of the asymptotic error in the filter. We show that, in both cases, the sufficient conditions for safety can be mapped to linear constraints on the control input at each time, enabling the development of tractable optimization-based controllers that guarantee safety, performance, and stability. Our approach is evaluated via simulation study on an adaptive cruise control case study.« less
  4. In this paper, we study the effect of non-vanishing disturbances on the stability of fixed-time stable (FxTS) systems. We present a new result on FxTS, which allows a positive term in the time derivative of the Lyapunov function with the aim to model bounded, non-vanishing disturbances in system dynamics. We characterize the neighborhood to which the system trajectories converge, as well as the convergence time. Then, we use the new FxTS result and formulate a quadratic program (QP) that yields control inputs which drive the trajectories of a class of nonlinear, control-affine systems to a goal set in the presencemore »of control input constraints and nonvanishing, bounded disturbances in the system dynamics. We consider an overtaking problem on a highway as a case study, and discuss how to both set up the QP and decide when to start the overtake maneuver in the presence of sensing errors.« less
  5. Reinforcement learning (RL) has recently shown promise in solving difficult numerical problems and has discovered non-intuitive solutions to existing problems. This study investigates the ability of a general RL agent to find an optimal control strategy for spacecraft attitude control problems. Two main types of Attitude Control Systems (ACS) are presented. First, the general ACS problem with full actuation is considered, but with saturation constraints on the applied torques, representing thruster-based ACSs. Second, an attitude control problem with reaction wheel based ACS is considered, which has more constraints on control authority. The agent is trained using the Proximal Policy Optimizationmore »(PPO) RL method to obtain an attitude control policy. To ensure robustness, the inertia of the satellite is unknown to the control agent and is randomized for each simulation. To achieve efficient learning, the agent is trained using curriculum learning. We compare the RL based controller to a QRF (quaternion rate feedback) attitude controller, a well-established state feedback control strategy. We investigate the nominal performance and robustness with respect to uncertainty in system dynamics. Our RL based attitude control agent adapts to any spacecraft mass without needing to re-train. In the range of 0.1 to 100,000 kg, our agent achieves 2% better performance to a QRF controller tuned for the same mass range, and similar performance to the QRF controller tuned specifically for a given mass. The performance of the trained RL agent for the reaction wheel based ACS achieved 10 higher better reward then that of a tuned QRF controller« less