skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on August 5, 2026

Title: Your Learned Constraint is Secretly a Backward Reachable Tube
Inverse Constraint Learning (ICL) is the problem of inferring constraints from safe (i.e., constraint-satisfying) demonstrations. The hope is that these inferred constraints can then be used downstream to search for safe policies for new tasks and, potentially, under different dynamics. Our paper explores the question of what mathematical entity ICL recovers. Somewhat surprisingly, we show that both in theory and in practice, ICL recovers the set of states where failure is inevitable, rather than the set of states where failure has already happened. In the language of safe control, this means we recover a backwards reachable tube (BRT) rather than a failure set. In contrast to the failure set, the BRT depends on the dynamics of the data collection system. We discuss the implications of the dynamics-conditionedness of the recovered constraint on both the sample-efficiency of policy search and the transferability of learned constraints. Our code is available in the following repository.  more » « less
Award ID(s):
2246447
PAR ID:
10631039
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Reinforcement Learning Conference
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We extend the learning from demonstration paradigm by providing a method for learning unknown constraints shared across tasks, using demonstrations of the tasks, their cost functions, and knowledge of the system dynamics and control constraints. Given safe demonstrations, our method uses hit-and-run sampling to obtain lower cost, and thus unsafe, trajectories. Both safe and unsafe trajectories are used to obtain a consistent representation of the unsafe set via solving an integer program. Our method generalizes across system dynamics and learns a guaranteed subset of the constraint. In addition, by leveraging a known parameterization of the constraint, we modify our method to learn parametric constraints in high dimensions. We also provide theoretical analysis on what subset of the constraint and safe set can be learnable from safe demonstrations. We demonstrate our method on linear and nonlinear system dynamics, show that it can be modified to work with suboptimal demonstrations, and that it can also be used to learn constraints in a feature space. 
    more » « less
  2. Self-optimizing efficiency of vapor compression cycles (VCCs) involves assigning multiple decision variables simultaneously in order to minimize power consumption while maintaining safe operating conditions. Due to the modeling complexity associated with cycle dynamics (and other smart building energy systems), online self-optimization requires algorithms that can safely and efficiently explore the search space in a derivative-free and model-agnostic manner. This makes Bayesian optimization (BO) a strong candidate for self-optimization. Unfortunately, classical BO algorithms ignore the relationship between consecutive optimizer candidates, resulting in jumps in the search space that can lead to fail-safe mechanisms being triggered, or undesired transient dynamics that violate operational constraints. To this end, we propose safe local search region (LSR)-BO, a global optimization methodology that builds on the BO framework while enforcing two types of safety constraints including black-box constraints on the output and LSR constraints on the input. We provide theoretical guarantees that under standard assumptions on the performance and constraint functions, LSR-BO guarantees constraints will be satisfied at all iterations with high probability. Furthermore, in the presence of only input LSR constraints, we show the method will converge to the true (unknown) globally optimal solution. We demonstrate the potential of our proposed LSR-BO method on a high-fidelity simulation model of a commercial vapor compression system with both LSR constraints on expansion valve positions and fan speeds, in addition to other safety constraints on discharge and evaporator temperatures. 
    more » « less
  3. This paper presents a provably safe method for constrained reorientation of a spacecraft in the presence of input constraints, bounded disturbances, and fixed frequency zero-order-hold (ZOH) control inputs. The set of states satisfying all pointing and rate constraints, herein called the safe set, is expressed as the intersection of the sublevel sets of several constraint functions, which are subsequently converted into control barrier functions (CBFs). The method then extends prior results on utilizing CBFs with ZOH controllers to the case of relative-degree-2 constraint functions, as occurs in the constrained attitude reorientation problem. The developed sampled-data controller is also shown to remain provably safe in the presence of input constraints and bounded disturbances. Finally, the method is validated and compared to three prior approaches via both low-fidelity and mid-fidelity simulations. 
    more » « less
  4. We study the problem of safe online convex optimization, where the action at each time step must satisfy a set of linear safety constraints. The goal is to select a sequence of ac- tions to minimize the regret without violating the safety constraints at any time step (with high probability). The parameters that specify the linear safety constraints are unknown to the algorithm. The algorithm has access to only the noisy observations of constraints for the chosen actions. We pro- pose an algorithm, called the Safe Online Projected Gradient Descent(SO-PGD) algorithm to address this problem. We show that, under the assumption of the availability of a safe baseline action, the SO-PGD algorithm achieves a regret O(T^2/3). While there are many algorithms for online convex optimization (OCO) problems with safety constraints avail- able in the literature, they allow constraint violations during learning/optimization, and the focus has been on characterizing the cumulative constraint violations. To the best of our knowledge, ours is the first work that provides an algorithm with provable guarantees on the regret, without violating the linear safety constraints (with high probability) at any time step. 
    more » « less
  5. We present a scalable algorithm for learning parametric constraints in high dimensions from safe expert demonstrations. To reduce the ill-posedness of the constraint recovery problem, our method uses hit-and-run sampling to generate lower cost, and thus unsafe, trajectories. Both safe and unsafe trajectories are used to obtain a representation of the unsafe set that is compatible with the data by solving an integer program in that representation’s parameter space. Our method can either leverage a known parameterization or incrementally grow a parameterization while remaining consistent with the data, and we provide theoretical guarantees on the conservativeness of the recovered unsafe set. We evaluate our method on high-dimensional constraints for high-dimensional systems by learning constraints for 7-DOF arm, quadrotor, and planar pushing examples, and show that our method outperforms baseline approaches. 
    more » « less