skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
Attention:The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, June 11 until 2:00 AM ET on Friday, June 12 due to maintenance. We apologize for the inconvenience.


Title: Safe and Efficient Reinforcement Learning Using Disturbance-Observer-Based Control Barrier Functions
Safe reinforcement learning (RL) with assured satisfaction of hard state constraints during training has recently received a lot of attention. Safety filters, e.g., based on control barrier functions (CBFs), provide a promising way for safe RL via modifying the unsafe actions of an RL agent on the fly. Existing safety filter-based approaches typically involve learning of uncertain dynamics and quantifying the learned model error, which leads to conservative filters before a large amount of data is collected to learn a good model, thereby preventing efficient exploration. This paper presents a method for safe and efficient RL using disturbance observers (DOBs) and control barrier functions (CBFs). Unlike most existing safe RL methods that deal with hard state constraints, our method does not involve model learning, and leverages DOBs to accurately estimate the pointwise value of the uncertainty, which is then incorporated into a robust CBF condition to generate safe actions. The DOB-based CBF can be used as a safety filter with model-free RL algorithms by minimally modifying the actions of an RL agent whenever necessary to ensure safety throughout the learning process. Simulation results on a unicycle and a 2D quadrotor demonstrate that the proposed method outperforms a state-of-the-art safe RL algorithm using CBFs and Gaussian processes-based model learning, in terms of safety violation rate, and sample and computational efficiency.  more » « less
Award ID(s):
2135925 2133656
PAR ID:
10427348
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of Machine Learning Research
Volume:
211
ISSN:
2640-3498
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Matni, Nikolai; Morari, Manfred; Pappas, George J. (Ed.)
    Safe reinforcement learning (RL) with assured satisfaction of hard state constraints during training has recently received a lot of attention. Safety filters, e.g., based on control barrier functions (CBFs), provide a promising way for safe RL via modifying the unsafe actions of an RL agent on the fly. Existing safety filter-based approaches typically involve learning of uncertain dynamics and quantifying the learned model error, which leads to conservative filters before a large amount of data is collected to learn a good model, thereby preventing efficient exploration. This paper presents a method for safe and efficient RL using disturbance observers (DOBs) and control barrier functions (CBFs). Unlike most existing safe RL methods that deal with hard state constraints, our method does not involve model learning, and leverages DOBs to accurately estimate the pointwise value of the uncertainty, which is then incorporated into a robust CBF condition to generate safe actions. The DOB-based CBF can be used as a safety filter with model-free RL algorithms by minimally modifying the actions of an RL agent whenever necessary to ensure safety throughout the learning process. Simulation results on a unicycle and a 2D quadrotor demonstrate that the proposed method outperforms a state-of-the-art safe RL algorithm using CBFs and Gaussian processes-based model learning, in terms of safety violation rate, and sample and computational efficiency. 
    more » « less
  2. Latent safety filters extend Hamilton-Jacobi (HJ) reachability to operate on latent state representations and dynamics learned directly from high-dimensional observations, enabling safe visuomotor control under hard-to-model constraints. However, existing methods implement “leastrestrictive” filtering that discretely switch between nominal and safety policies, potentially undermining the task performance that makes modern visuomotor policies valuable. While reachability value functions can, in principle, be adapted to be control barrier functions (CBFs) for smooth optimization-based filtering, we theoretically and empirically show that current latentspace learning methods produce fundamentally incompatible value functions. We identify two sources of incompatibility: First, in HJ reachability, failures are encoded via a “margin function” in latent space, whose sign indicates whether or not a latent is in the constraint set. However, representing the margin function as a classifier yields saturated value functions that exhibit discontinuous jumps. We prove that the value function’s Lipschitz constant scales linearly with the margin function’s Lipschitz constant, revealing that smooth CBFs require smooth margins. Second, reinforcement learning (RL) approximations trained solely on safety policy data yield inaccurate value estimates for nominal policy actions, precisely where CBF filtering needs them. We propose the LatentCBF, which addresses both challenges through gradient penalties that lead to smooth margin functions without additional labeling, and a value-training procedure that mixes data from both nominal and safety policy distributions. Experiments on simulated benchmarks and hardware with a vision-based manipulation policy demonstrate that LatentCBF enables smooth safety filtering while doubling the task-completion rate over prior switching methods. 
    more » « less
  3. Control Barrier Functions (CBFs) are an effective methodology to ensure safety and performative efficacy in real-time control applications such as power systems, resource allocation, autonomous vehicles, robotics, etc. This approach ensures safety independently of the high-level tasks that may have been pre-planned off-line. For example, CBFs can be used to guarantee that a vehicle will remain in its lane. However, when the number of agents is large, computation of CBFs can suffer from the curse of dimensionality in the multi-agent setting. In this work, we present Mean-field Control Barrier Functions (MF-CBFs), which extends the CBF framework to the mean-field (or swarm control) setting. The core idea is to model a population of agents as probability measures in the state space and build corresponding control barrier functions. Similar to traditional CBFs, we derive safety constraints on the (distributed) controls but now relying on the differential calculus in the space of probability measures. 
    more » « less
  4. In this study, we address the problem of safe control in systems subject to state and input constraints by integrating the Control Barrier Function (CBF) into the Model Predictive Control (MPC) formulation. While CBF offers a conservative policy and traditional MPC lacks the safety guarantee beyond the finite horizon, the proposed scheme takes advantage of both MPC and CBF approaches to provide a guaranteed safe control policy with reduced conservatism and a shortened horizon. The proposed methodology leverages the sum-of-square (SOS) technique to construct CBFs that make forward invariant safe sets in the state space that are then used as a terminal constraint on the last predicted state. CBF invariant sets cover the state space around system fixed points. These islands of forward invariant CBF sets will be connected to each other using MPC. To do this, we proposed a technique to handle the MPC optimization problem subject to the combination of intersections and union of constraints. Our approach, termed Model Predictive Control Barrier Functions (MPCBF), is validated using numerical examples to demonstrate its efficacy, showing improved performance compared to classical MPC and CBF. 
    more » « less
  5. Real-time controllers must satisfy strict safety requirements. Recently, Control Barrier Functions (CBFs) have been proposed that guarantee safety by ensuring that a suitablydefined barrier function remains bounded for all time. The CBF method, however, has only been developed for deterministic systems and systems with worst-case disturbances and uncertainties. In this paper, we develop a CBF framework for safety of stochastic systems. We consider complete information systems, in which the controller has access to the exact system state, as well as incomplete information systems where the state must be reconstructed from noisy measurements. In the complete information case, we formulate a notion of barrier functions that leads to sufficient conditions for safety with probability 1. In the incomplete information case, we formulate barrier functions that take an estimate from an extended Kalman filter as input, and derive bounds on the probability of safety as a function of the asymptotic error in the filter. We show that, in both cases, the sufficient conditions for safety can be mapped to linear constraints on the control input at each time, enabling the development of tractable optimization-based controllers that guarantee safety, performance, and stability. Our approach is evaluated via simulation study on an adaptive cruise control case study. 
    more » « less