 Award ID(s):
 2146563
 NSFPAR ID:
 10419672
 Editor(s):
 Bouajjani, A; Holík, L.; Wu, Z.
 Date Published:
 Journal Name:
 Automated Technology for Verification and Analysis. ATVA 2022.
 Page Range / eLocation ID:
 354–370
 Format(s):
 Medium: X
 Sponsoring Org:
 National Science Foundation
More Like this

Translating OmegaRegular Specifications to Average Objectives for ModelFree Reinforcement LearningPiotr Faliszewski ; Viviana Mascardi (Ed.)Recent success in reinforcement learning (RL) has brought renewed attention to the design of reward functions by which agent behavior is reinforced or deterred. Manually designing reward functions is tedious and errorprone. An alternative approach is to specify a formal, unambiguous logic requirement, which is automatically translated into a reward function to be learned from. Omegaregular languages, of which Linear Temporal Logic (LTL) is a subset, are a natural choice for specifying such requirements due to their use in verification and synthesis. However, current techniques based on omegaregular languages learn in an episodic manner whereby the environment is periodically reset to an initial state during learning. In some settings, this assumption is challenging or impossible to satisfy. Instead, in the continuing setting the agent explores the environment without resets over a single lifetime. This is a more natural setting for reasoning about omegaregular specifications defined over infinite traces of agent behavior. Optimizing the average reward instead of the usual discounted reward is more natural in this case due to the infinitehorizon objective that poses challenges to the convergence of discounted RL solutions. We restrict our attention to the omegaregular languages which correspond to absolute liveness specifications. These specifications cannot be invalidated by any finite prefix of agent behavior, in accordance with the spirit of a continuing problem. We propose a translation from absolute liveness omegaregular languages to an average reward objective for RL. Our reduction can be done onthefly, without full knowledge of the environment, thereby enabling the use of modelfree RL algorithms. Additionally, we propose a reward structure that enables RL without episodic resetting in communicating MDPs, unlike previous approaches. We demonstrate empirically with various benchmarks that our proposed method of using average reward RL for continuing tasks defined by omegaregular specifications is more effective than competing approaches that leverage discounted RL.more » « less

The notion of comparison between system runs is fundamental in formal verification. This concept is implicitly present in the verification of qualitative systems, and is more pronounced in the verification of quantitative systems. In this work, we identify a novel mode of comparison in quantitative systems: the online comparison of the aggregate values of two sequences of quantitative weights. This notion is embodied by comparator automata (comparators, in short), a new class of automata that read two infinite sequences of weights synchronously and relate their aggregate values. Weshowthat aggregate functions that can be represented with B¨uchi automaton result in comparators that are finitestate and accept by the B¨uchi condition as well. Such ωregular comparators further lead to generic algorithms for a number of wellstudied problems, including the quantitative inclusion and winning strategies in quantitative graph games with incomplete information, as well as related nondecision problems, such as obtaining a f inite representation of all counterexamples in the quantitative inclusion problem. We study comparators for two aggregate functions: discountedsum and limitaverage. We prove that the discountedsum comparator is ωregular iff the discountfactor is an integer. Not every aggregate function, however, has an ωregular comparator. Specifically, we show that the language of sequencepairs for which limitaverage aggregates exist is neither ωregular nor ωcontextfree. Given this result, we introduce the notion of prefixaverage as a relaxation of limitaverage aggregation, and show that it admits ωcontextfree comparators i.e. comparator automata expressed by B¨uchi pushdown automata.more » « less

We consider a multistage inventory system with stochastic demand and processing capacity constraints at each stage, for both finitehorizon and infinitehorizon, discountedcost settings. For a class of such systems characterized by having the smallest capacity at the most downstream stage and system utilization above a certain threshold, we identify the structure of the optimal policy, which represents a novel variation of the orderupto policy. We find the explicit functional form of the optimal orderupto levels, and show that they depend (only) on upstream echelon inventories. We establish that, above the threshold utilization, this optimal policy achieves the decomposition of the multidimensional objective cost function for the system into a sum of singledimensional convex functions. This decomposition eliminates the curse of dimensionality and allows us to numerically solve the problem. We provide a fast algorithm to determine a (tight) upper bound on this threshold utilization for capacityconstrained inventory problems with an arbitrary number of stages. We make use of this algorithm to quantify upper bounds on the threshold utilization for three, four, and fivestage capacitated systems over a range of model parameters, and discuss insights that emerge.more » « less

Formal verification of cyberphysical systems (CPS) is challenging because it has to consider realtime and concurrency aspects that are often absent in ordinary software. Moreover, the software in CPS is often complex and lowlevel, making it hard to assure that a formal model of the system used for verification is a faithful representation of the actual implementation, which can undermine the value of a verification result. To address this problem, we propose a methodology for building verifiable CPS based on the principle that a formal model of the software can be derived
automatically from its implementation. Our approach requires that the system implementation is specified inLingua Franca (LF), a polyglot coordination language tailored for realtime, concurrent CPS, which we made amenable to the specification of safety properties via annotations in the code. The program structure and the deterministic semantics of LF enable automatic construction of formal axiomatic models directly from LF programs. The generated models are automatically checked using Bounded Model Checking (BMC) by the verification engineUclid5 using theZ3 SMT solver. The proposed technique enables checking a welldefined fragment of Safety Metric Temporal Logic (Safety MTL) formulas. To ensure the completeness of BMC, we present a method to derive an upper bound on the completeness threshold of an axiomatic model based on the semantics of LF. We implement our approach in the LF Verifier and evaluate it using a benchmark suite with 22 programs sampled from reallife applications and benchmarks for Erlang, Lustre, actororiented languages, and RTOSes. The LF Verifier correctly checks 21 out of 22 programs automatically. 
Effective symbolic evaluation is key to building scalable ver ification and synthesis tools based on SMT solving. These tools use sym bolic evaluators to reduce the semantics of all paths through a finite program to logical constraints, discharged with an SMT solver. Using an evaluator effectively requires tool developers to be able to identify and re pair performance bottlenecks in code under allpath evaluation, a difficult task, even for experts. This paper presents a new method for repairing such bottlenecks automatically. The key idea is to formulate the symbolic performance repair problem as combinatorial search through a space of semanticspreserving transformations, or repairs, to find an equivalent program with minimal cost under symbolic evaluation. The key to real izing this idea is (1) defining a small set of generic repairs that can be combined to fix common bottlenecks, and (2) searching for combinations of these repairs to find good solutions quickly and best ones eventually. Our technique, SymFix, contributes repairs based on deforestation and symbolic reflection, and an efficient algorithm that uses symbolic profil ing to guide the search for fixes. To evaluate SymFix, we implement it for the Rosette solveraided language and symbolic evaluator. Applying SymFix to 18 published verification and synthesis tools built in Rosette, we find that it automatically improves the performance of 12 tools by a factor of 1.1×–91.7×, and 4 of these fixes match or outperform expert written repairs. SymFix also finds 5 fixes that were missed by experts.more » « less