This content will become publicly available on March 25, 2025
Notions of transition invariants and closure certificates have seen recent use in the formal verification of controlled dynamical systems against \omega-regular properties.Unfortunately, existing approaches face limitations in two directions.First, they require a closed-form mathematical expression representing the model of the system.Such an expression may be difficult to find, too complex to be of any use, or unavailable due to security or privacy constraints.Second, finding such invariants typically rely on optimization techniques such as sum-of-squares (SOS) or satisfiability modulo theory (SMT) solvers.This restricts the classes of systems that need to be formally verified.To address these drawbacks, we introduce a notion of neural closure certificates.We present a data-driven algorithm that trains a neural network to represent a closure certificate.Our approach is formally correct under some mild assumptions, i.e., one is able to formally show that the unknown system satisfies the \omega-regular property of interest if a neural closure certificate can be computed.Finally, we demonstrate the efficacy of our approach with relevant case studies.
more » « less- PAR ID:
- 10508413
- Publisher / Repository:
- PKP Publishing Services Network
- Date Published:
- Journal Name:
- Proceedings of the AAAI Conference on Artificial Intelligence
- Volume:
- 38
- Issue:
- 19
- ISSN:
- 2159-5399
- Page Range / eLocation ID:
- 21446 to 21453
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Notions of transition invariants and closure certificates have seen recent use in the formal verification of controlled dy- namical systems against ω-regular properties. The existing approaches face limitations in two directions. First, they re- quire a closed-form mathematical expression representing the model of the system. Such an expression may be difficult to find, too complex to be of any use, or unavailable due to security or privacy constraints. Second, finding such invari- ants typically rely on optimization techniques such as sum-of- squares (SOS) or satisfiability modulo theory (SMT) solvers. This restricts the classes of systems that need to be formally verified. To address these drawbacks, we introduce a notion of neural closure certificates. We present a data-driven algo- rithm that trains a neural network to represent a closure cer- tificate. Our approach is formally correct under some mild as- sumptions, i.e., one is able to formally show that the unknown system satisfies the ω-regular property of interest if a neural closure certificate can be computed. Finally, we demonstrate the efficacy of our approach with relevant case studies.more » « less
-
A barrier certificate, defined over the states of a dynamical system, is a real-valued function whose zero level set characterizes an in- ductively verifiable state invariant separating reachable states from unsafe ones. When combined with powerful decision procedures— such as sum-of-squares programming (SOS) or satisfiability-modulo- theory solvers (SMT)—barrier certificates enable an automated de- ductive verification approach to safety. The barrier certificate ap- proach has been extended to refute LTL and l -regular specifications by separating consecutive transitions of corresponding l -automata in the hope of denying all accepting runs. Unsurprisingly, such tactics are bound to be conservative as refutation of recurrence properties requires reasoning about the well-foundedness of the transitive closure of the transition relation. This paper introduces the notion of closure certificates as a natural extension of barrier certificates from state invariants to transition invariants. We aug- ment these definitions with SOS and SMT based characterization for automating the search of closure certificates and demonstrate their effectiveness over some case studies.more » « less
-
Translating Omega-Regular Specifications to Average Objectives for Model-Free Reinforcement LearningPiotr Faliszewski ; Viviana Mascardi (Ed.)Recent success in reinforcement learning (RL) has brought renewed attention to the design of reward functions by which agent behavior is reinforced or deterred. Manually designing reward functions is tedious and error-prone. An alternative approach is to specify a formal, unambiguous logic requirement, which is automatically translated into a reward function to be learned from. Omega-regular languages, of which Linear Temporal Logic (LTL) is a subset, are a natural choice for specifying such requirements due to their use in verification and synthesis. However, current techniques based on omega-regular languages learn in an episodic manner whereby the environment is periodically reset to an initial state during learning. In some settings, this assumption is challenging or impossible to satisfy. Instead, in the continuing setting the agent explores the environment without resets over a single lifetime. This is a more natural setting for reasoning about omega-regular specifications defined over infinite traces of agent behavior. Optimizing the average reward instead of the usual discounted reward is more natural in this case due to the infinite-horizon objective that poses challenges to the convergence of discounted RL solutions. We restrict our attention to the omega-regular languages which correspond to absolute liveness specifications. These specifications cannot be invalidated by any finite prefix of agent behavior, in accordance with the spirit of a continuing problem. We propose a translation from absolute liveness omega-regular languages to an average reward objective for RL. Our reduction can be done on-the-fly, without full knowledge of the environment, thereby enabling the use of model-free RL algorithms. Additionally, we propose a reward structure that enables RL without episodic resetting in communicating MDPs, unlike previous approaches. We demonstrate empirically with various benchmarks that our proposed method of using average reward RL for continuing tasks defined by omega-regular specifications is more effective than competing approaches that leverage discounted RL.more » « less
-
Bouajjani, A. ; Holík, L. ; Wu, Z. (Ed.)The expanding role of reinforcement learning (RL) in safety-critical system design has promoted omega-automata as a way to express learning requirements—often non-Markovian—with greater ease of expression and interpretation than scalar reward signals. When 𝜔-automata were first proposed in model-free RL, deterministic Rabin acceptance conditions were used in an attempt to provide a direct translation from omega-automata to finite state “reward” machines defined over the same automaton structure (a memoryless reward translation). While these initial attempts to provide faithful, memoryless reward translations for Rabin acceptance conditions remained unsuccessful, translations were discovered for other acceptance conditions such as suitable, limit-deterministic Buechi acceptance or more generally, good-for-MDP Buechi acceptance conditions. Yet, the question “whether a memoryless translation of Rabin conditions to scalar rewards exists” remained unresolved. This paper presents an impossibility result implying that any attempt to use Rabin automata directly (without extra memory) for model-free RL is bound to fail. To establish this result, we show a link between a class of automata enabling memoryless reward translation to closure properties of its accepting and rejecting infinity sets, and to the insight that both the property and its complement need to allow for positional strategies for such an approach to work. We believe that such impossibility results will provide foundations for the application of RL to safety-critical systems.more » « less
-
null (Ed.)Omega-regular properties—specified using linear time temporal logic or various forms of omega-automata—find increasing use in specifying the objectives of reinforcement learning (RL). The key problem that arises is that of faithful and effective translation of the objective into a scalar reward for model-free RL. A recent approach exploits Büchi automata with restricted nondeterminism to reduce the search for an optimal policy for an Open image in new window-regular property to that for a simple reachability objective. A possible drawback of this translation is that reachability rewards are sparse, being reaped only at the end of each episode. Another approach reduces the search for an optimal policy to an optimization problem with two interdependent discount parameters. While this approach provides denser rewards than the reduction to reachability, it is not easily mapped to off-the-shelf RL algorithms. We propose a reward scheme that reduces the search for an optimal policy to an optimization problem with a single discount parameter that produces dense rewards and is compatible with off-the-shelf RL algorithms. Finally, we report an experimental comparison of these and other reward schemes for model-free RL with omega-regular objectives.more » « less