Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available August 1, 2025
-
Safe control designs for robotic systems remain challenging because of the difficulties of explicitly solving optimal control with nonlinear dynamics perturbed by stochastic noise. However, recent technological advances in computing devices enable online optimization or sampling-based methods to solve control problems. For example, Control Barrier Functions (CBFs) have been proposed to numerically solve convex optimization problems that ensure the control input to stay in the safe set. Model Predictive Path Integral (MPPI) control uses forward sampling of stochastic differential equations to solve optimal control problems online. Both control algorithms are widely used for nonlinear systems because they avoid calculating the derivatives of the nonlinear dynamic functions. In this paper, we use Stochastic Control Barrier Functions (SCBFs) constraints to limit sample regions in the samplingbased algorithm, ensuring safety in a probabilistic sense and improving sample efficiency with a stochastic differential equation. We also show that our algorithm needs fewer samples than the original MPPI algorithm does by providing a sampling complexity analysis.more » « less
-
Safe and Efficient Reinforcement Learning Using Disturbance-Observer-Based Control Barrier FunctionsSafe reinforcement learning (RL) with assured satisfaction of hard state constraints during training has recently received a lot of attention. Safety filters, e.g., based on control barrier functions (CBFs), provide a promising way for safe RL via modifying the unsafe actions of an RL agent on the fly. Existing safety filter-based approaches typically involve learning of uncertain dynamics and quantifying the learned model error, which leads to conservative filters before a large amount of data is collected to learn a good model, thereby preventing efficient exploration. This paper presents a method for safe and efficient RL using disturbance observers (DOBs) and control barrier functions (CBFs). Unlike most existing safe RL methods that deal with hard state constraints, our method does not involve model learning, and leverages DOBs to accurately estimate the pointwise value of the uncertainty, which is then incorporated into a robust CBF condition to generate safe actions. The DOB-based CBF can be used as a safety filter with model-free RL algorithms by minimally modifying the actions of an RL agent whenever necessary to ensure safety throughout the learning process. Simulation results on a unicycle and a 2D quadrotor demonstrate that the proposed method outperforms a state-of-the-art safe RL algorithm using CBFs and Gaussian processes-based model learning, in terms of safety violation rate, and sample and computational efficiency.more » « less
-
Safe and Efficient Reinforcement Learning using Disturbance-Observer-Based Control Barrier FunctionsMatni, Nikolai ; Morari, Manfred ; Pappas, George J. (Ed.)Safe reinforcement learning (RL) with assured satisfaction of hard state constraints during training has recently received a lot of attention. Safety filters, e.g., based on control barrier functions (CBFs), provide a promising way for safe RL via modifying the unsafe actions of an RL agent on the fly. Existing safety filter-based approaches typically involve learning of uncertain dynamics and quantifying the learned model error, which leads to conservative filters before a large amount of data is collected to learn a good model, thereby preventing efficient exploration. This paper presents a method for safe and efficient RL using disturbance observers (DOBs) and control barrier functions (CBFs). Unlike most existing safe RL methods that deal with hard state constraints, our method does not involve model learning, and leverages DOBs to accurately estimate the pointwise value of the uncertainty, which is then incorporated into a robust CBF condition to generate safe actions. The DOB-based CBF can be used as a safety filter with model-free RL algorithms by minimally modifying the actions of an RL agent whenever necessary to ensure safety throughout the learning process. Simulation results on a unicycle and a 2D quadrotor demonstrate that the proposed method outperforms a state-of-the-art safe RL algorithm using CBFs and Gaussian processes-based model learning, in terms of safety violation rate, and sample and computational efficiency.more » « less
-
This work presents an optimal sampling-based method to solve the real-time motion planning problem in static and dynamic environments, exploiting the Rapid-exploring Random Trees (RRT) algorithm and the Model Predictive Path Integral (MPPI) algorithm. The RRT algorithm provides a nominal mean value of the random control distribution in the MPPI algorithm, resulting in satisfactory control performance in static and dynamic environments without a need for fine parameter tuning. We also discuss the importance of choosing the right mean of the MPPI algorithm, which balances exploration and optimality gap, given a fixed sample size. In particular, a sufficiently large mean is required to explore the state space enough, and a sufficiently small mean is required to guarantee that the samples reconstruct the optimal control. The proposed methodology automates the procedure of choosing the right mean by incorporating the RRT algorithm. The simulations demonstrate that the proposed algorithm can solve the motion planning problem for static or dynamic environments.more » « less
-
We propose a reinforcement learning framework where an agent uses an internal nominal model for stochastic model predictive control (MPC) while compensating for a disturbance. Our work builds on the existing risk-aware optimal control with stochastic differential equations (SDEs) that aims to deal with such disturbance. However, the risk sensitivity and the noise strength of the nominal SDE in the riskaware optimal control are often heuristically chosen. In the proposed framework, the risk-taking policy determines the behavior of the MPC to be risk-seeking (exploration) or riskaverse (exploitation). Specifcally, we employ the risk-aware path integral control that can be implemented as a Monte-Carlo (MC) sampling with fast parallel simulations using a GPU. The MC sampling implementations of the MPC have been successful in robotic applications due to their real-time computation capability. The proposed framework that adapts the noise model and the risk sensitivity outperforms the standard model predictive path integmore » « less
-
Matni, Nikolai ; Morari, Manfred ; Pappas, George J. (Ed.)Controller tuning is a vital step to ensure a controller delivers its designed performance. DiffTune has been proposed as an automatic tuning method that unrolls the dynamical system and controller into a computational graph and uses auto-differentiation to obtain the gradient for the controller’s parameter update. However, DiffTune uses the vanilla gradient descent to iteratively update the parameter, in which the performance largely depends on the choice of the learning rate (as a hyperparameter). In this paper, we propose to use hyperparameter-free methods to update the controller parameters. We find the optimal parameter update by maximizing the loss reduction, where a predicted loss based on the approximated state and control is used for the maximization. Two methods are proposed to optimally update the parameters and are compared with related variants in simulations on a Dubin’s car and a quadrotor. Simulation experiments show that the proposed first-order method outperforms the hyperparameter-based methods and is more robust than the second-order hyperparameter-free methods.more » « less