skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Temporal Logic Specification-Conditioned Decision Transformer for Offline Safe Reinforcement Learning
Offline safe reinforcement learning (RL) aims to train a constraint satisfaction policy from a fixed dataset. Current state-of-the-art approaches are based on supervised learning with a conditioned policy. However, these approaches fall short in real-world applications that involve complex tasks with rich temporal and logical structures. In this paper, we propose temporal logic Specification-conditioned Decision Transformer (SDT), a novel framework that harnesses the expressive power of signal temporal logic (STL) to specify complex temporal rules that an agent should follow and the sequential modeling capability of Decision Transformer (DT). Empirical evaluations on the DSRL benchmarks demonstrate the better capacity of SDT in learning safe and high-reward policies compared with existing approaches. In addition, SDT shows good alignment with respect to different desired degrees of satisfaction of the STL specification that it is conditioned on.  more » « less
Award ID(s):
2340776
PAR ID:
10582385
Author(s) / Creator(s):
; ;
Publisher / Repository:
2024 International Conference on Machine Learning
Date Published:
Subject(s) / Keyword(s):
Signal temporal logic Offline safe reinforcement learning Decision transformer
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper studies the satisfaction of a class of temporal properties for cyber-physical systems (CPSs) over a finite-time horizon in the presence of an adversary, in an environment described by discretetime dynamics. The temporal logic specification is given in safe−LTLF , a fragment of linear temporal logic over traces of finite length. The interaction of the CPS with the adversary is modeled as a two-player zerosum discrete-time dynamic stochastic game with the CPS as defender. We formulate a dynamic programming based approach to determine a stationary defender policy that maximizes the probability of satisfaction of a safe − LTLF formula over a finite time-horizon under any stationary adversary policy. We introduce secure control barrier certificates (S-CBCs), a generalization of barrier certificates and control barrier certificates that accounts for the presence of an adversary, and use S-CBCs to provide a lower bound on the above satisfaction probability. When the dynamics of the evolution of the system state has a specific underlying structure, we present a way to determine an S-CBC as a polynomial in the state variables using sum-of-squares optimization. An illustrative example demonstrates our approach. 
    more » « less
  2. This study proposes a novel planning framework based on a model predictive control formulation that incorporates signal temporal logic (STL) specifications for task completion guarantees and robustness quantification. This marks the first-ever study to apply STL-guided trajectory optimization for bipedal locomotion push recovery, where the robot experiences unexpected disturbances. Existing recovery strategies often struggle with complex task logic reasoning and locomotion robustness evaluation, making them susceptible to failures due to inappropriate recovery strategies or insufficient robustness. To address this issue, the STL-guided framework generates optimal and safe recovery trajectories that simultaneously satisfy the task specification and maximize the locomotion robustness. Our framework outperforms a state-of-the-art locomotion controller in a high-fidelity dynamic simulation, especially in scenarios involving crossed-leg maneuvers. Furthermore, it demonstrates versatility in tasks such as locomotion on stepping stones, where the robot must select from a set of disjointed footholds to maneuver successfully. 
    more » « less
  3. This paper studies the synthesis of control policies for an agent that has to satisfy a temporal logic specification in a partially observable environment, in the presence of an adversary. The interaction of the agent (defender) with the adversary is modeled as a partially observable stochastic game. The search for policies is limited to over the space of finite state controllers, which leads to a tractable approach to determine policies. The goal is to generate a defender policy to maximize satisfaction of a given temporal logic specification under any adversary policy. We relate the satisfaction of the specification in terms of reaching (a subset of) recurrent states of a Markov chain. We then present a procedure to determine a set of defender and adversary finite state controllers of given sizes that will satisfy the temporal logic specification. We illustrate our approach with an example. 
    more » « less
  4. Uncertainty in safety-critical cyber-physical systems can be modeled using a finite number of parameters or parameterized input signals. Given a system specification in Signal Temporal Logic (STL), we would like to verify that for all (infinite) values of the model parameters/input signals, the system satisfies its specification. Unfortunately, this problem is undecidable in general.Statistical model checking(SMC) offers a solution by providing guarantees on the correctness of CPS models by statistically reasoning on model simulations. We propose a new approach for statistical verification of CPS models for user-provided distribution on the model parameters. Our technique uses model simulations to learnsurrogate models, and usesconformal inferenceto provide probabilistic guarantees on the satisfaction of a given STL property. Additionally, we can provide prediction intervals containing the quantitative satisfaction values of the given STL property for any user-specified confidence level. We compare this prediction interval with the interval we get using risk estimation procedures. We also propose a refinement procedure based on Gaussian Process (GP)-based surrogate models for obtaining fine-grained probabilistic guarantees over sub-regions in the parameter space. This in turn enables the CPS designer to choose assured validity domains in the parameter space for safety-critical applications. Finally, we demonstrate the efficacy of our technique on several CPS models. 
    more » « less
  5. We propose an automatic synthesis technique to generate provably correct controllers of stochastic linear dynamical systems for Signal Temporal Logic (STL) specifications. While formal synthesis problems can be directly formulated as exists-forall constraints, the quantifier alternation restricts the scalability of such an approach. We use the duality between a system and its proof of correctness to partially alleviate this challenge. We decompose the controller synthesis into two subproblems, each addressing orthogonal concerns - stabilization with respect to the noise, and meeting the STL specification. The overall controller is a nested controller comprising of the feedback controller for noise cancellation and an open loop controller for STL satisfaction. The correct-by-construction compositional synthesis of this nested controller relies on using the guarantees of the feedback controller instead of the controller itself. We use a linear feedback controller as the stabilizing controller for linear systems with bounded additive noise and over-approximate its ellipsoid stability guarantee with a polytope. We then use this over-approximation to formulate a mixed-integer linear programming (MILP) problem to synthesize an open-loop controller that satisfies STL specifications. 
    more » « less