skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning constraints from demonstrations with grid and parametric representations
We extend the learning from demonstration paradigm by providing a method for learning unknown constraints shared across tasks, using demonstrations of the tasks, their cost functions, and knowledge of the system dynamics and control constraints. Given safe demonstrations, our method uses hit-and-run sampling to obtain lower cost, and thus unsafe, trajectories. Both safe and unsafe trajectories are used to obtain a consistent representation of the unsafe set via solving an integer program. Our method generalizes across system dynamics and learns a guaranteed subset of the constraint. In addition, by leveraging a known parameterization of the constraint, we modify our method to learn parametric constraints in high dimensions. We also provide theoretical analysis on what subset of the constraint and safe set can be learnable from safe demonstrations. We demonstrate our method on linear and nonlinear system dynamics, show that it can be modified to work with suboptimal demonstrations, and that it can also be used to learn constraints in a feature space.  more » « less
Award ID(s):
1750489
PAR ID:
10316776
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
The International Journal of Robotics Research
Volume:
40
Issue:
10-11
ISSN:
0278-3649
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present a scalable algorithm for learning parametric constraints in high dimensions from safe expert demonstrations. To reduce the ill-posedness of the constraint recovery problem, our method uses hit-and-run sampling to generate lower cost, and thus unsafe, trajectories. Both safe and unsafe trajectories are used to obtain a representation of the unsafe set that is compatible with the data by solving an integer program in that representation’s parameter space. Our method can either leverage a known parameterization or incrementally grow a parameterization while remaining consistent with the data, and we provide theoretical guarantees on the conservativeness of the recovered unsafe set. We evaluate our method on high-dimensional constraints for high-dimensional systems by learning constraints for 7-DOF arm, quadrotor, and planar pushing examples, and show that our method outperforms baseline approaches. 
    more » « less
  2. Inverse Constraint Learning (ICL) is the problem of inferring constraints from safe (i.e., constraint-satisfying) demonstrations. The hope is that these inferred constraints can then be used downstream to search for safe policies for new tasks and, potentially, under different dynamics. Our paper explores the question of what mathematical entity ICL recovers. Somewhat surprisingly, we show that both in theory and in practice, ICL recovers the set of states where failure is inevitable, rather than the set of states where failure has already happened. In the language of safe control, this means we recover a backwards reachable tube (BRT) rather than a failure set. In contrast to the failure set, the BRT depends on the dynamics of the data collection system. We discuss the implications of the dynamics-conditionedness of the recovered constraint on both the sample-efficiency of policy search and the transferability of learned constraints. Our code is available in the following repository. 
    more » « less
  3. null (Ed.)
    We present a method for learning to satisfy uncertain constraints fromdemonstrations. Our method uses robust optimization to obtain a belief over thepotentially infinite set of possible constraints consistent with the demonstrations,and then uses this belief to plan trajectories that trade off performance with sat-isfying the possible constraints. We use these trajectories in a closed-loop policythat executes and replans using belief updates, which incorporate data gatheredduring execution. We derive guarantees on the accuracy of our constraint beliefand probabilistic guarantees on plan safety. We present results on a 7-DOF armand 12D quadrotor, showing our method can learn to satisfy high-dimensional (upto 30D) uncertain constraints, and outperforms baselines in safety and efficiency. 
    more » « less
  4. Abstract Optimizing charging protocols is critical for reducing battery charging time and decelerating battery degradation in applications such as electric vehicles. Recently, reinforcement learning (RL) methods have been adopted for such purposes. However, RL‐based methods may not ensure system (safety) constraints, which can cause irreversible damages to batteries and reduce their lifetime. To this end, this article proposes an adaptive and safe RL framework to optimize fast charging strategies while respecting safety constraints with a high probability. In our method, any unsafe action that the RL agent decides will be projected into a safety region by solving a constrained optimization problem. The safety region is constructed using adaptive Gaussian process (GP) models, consisting of static and dynamic GPs, that learn from online experience to adaptively account for any changes in battery dynamics. Simulation results show that our method can charge the batteries rapidly with constraint satisfaction under varying operating conditions. 
    more » « less
  5. This paper introduces LeTO, a method for learning constrained visuomotor policy with differentiable trajectory optimization. Our approach integrates a differentiable optimization layer into the neural network. By formulating the optimization layer as a trajectory optimization problem, we enable the model to end-to-end generate actions in a safe and constraint-controlled fashion without extra modules. Our method allows for the introduction of constraint information during the training process, thereby balancing the training objectives of satisfying constraints, smoothing the trajectories, and minimizing errors with demonstrations. This “gray box” method marries optimization-based safety and interpretability with powerful representational abilities of neural networks. We quantitatively evaluate LeTO in simulation and in the real robot. The results demonstrate that LeTO performs well in both simulated and real-world tasks. In addition, it is capable of generating trajectories that are less uncertain, higher quality, and smoother compared to existing imitation learning methods. Therefore, it is shown that LeTO provides a practical example of how to achieve the integration of neural networks with trajectory optimization. We release our code at https://github.com/ZhengtongXu/LeTO. 
    more » « less