This report presents the results of a friendly competition for formal verification of continuous and hybrid systems with artificial intelligence (AI) components. Specifically, machine learning (ML) components in cyber-physical systems (CPS), such as feedforward neural networks used as feedback controllers in closed-loop systems are considered, which is a class of systems classically known as intelligent control systems, or in more modern and specific terms, neural network control systems (NNCS). We more broadly refer to this category as AI and NNCS (AINNCS). The friendly competition took place as part of the workshop Applied Verification for Continuous and Hybrid Systems (ARCH) in 2020. In the second edition of this AINNCS category at ARCH-COMP, four tools have been applied to solve seven different benchmark problems, (in alphabetical order): NNV, OVERT, ReachNN*, and VenMAS. This report is a snapshot of the current landscape of tools and the types of benchmarks for which these tools are suited. Due to the diversity of problems, lack of a shared hardware platform, and the early stage of the competition, we are not ranking tools in terms of performance, yet the presented results probably provide the most complete assessment of current tools for safety verification of NNCS.
more »
« less
ARCH-COMP20 Repeatability Evaluation Report
This report presents the results of the repeatability evaluation for the 4th International Competition on Verifying Continuous and Hybrid Systems (ARCH-COMP’20). The competition took place as part of the workshop Applied Verification for Continuous and Hybrid Systems (ARCH) in 2020, affiliated with the IFAC World Congress. In its fourth edition, twenty-eight tools submitted artifacts through a Git repository for the repeatability evaluation, applied to solve benchmark problems for seven competition categories. The majority of participants adhered to the requirements for this year’s repeatability evaluation, namely to submit scripts to automatically install and execute tools in containerized virtual environments (specifically Dockerfiles to execute within Docker), and several categories used performance evaluation information from a common execution platform. The repeatability results represent a snapshot of the current landscape of tools and the types of benchmarks for which they are particularly suited and for which others may repeat their analyses. Due to the diversity of problems in verification of continuous and hybrid systems, as well as basing on standard practice in repeatability evaluations, we evaluate the tools with pass and/or failing being repeatable.
more »
« less
- Award ID(s):
- 1918450
- PAR ID:
- 10195488
- Date Published:
- Journal Name:
- EPiC Series in Computing
- Volume:
- 74
- ISSN:
- 2398-7340
- Page Range / eLocation ID:
- 175 to 165
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This report presents the results of a friendly competition for formal verification of continuous and hybrid systems with artificial intelligence (AI) components. Specifically, machine learning (ML) components in cyber-physical systems (CPS), such as feedforward neural networks used as feedback controllers in closed-loop systems are considered, which is a class of systems classically known as intelligent control systems, or in more modern and specific terms, neural network control systems (NNCS). We more broadly refer to this category as AI and NNCS (AINNCS). The friendly competition took place as part of the workshop Applied Verification for Continuous and Hybrid Systems (ARCH) in 2021. In the third edition of this AINNCS category at ARCH-COMP, three tools have been applied to solve seven different benchmark problems, (in alphabetical order): JuliaReach, NNV, and Verisig. JuliaReach is a new participant in this category, Verisig participated previously in 2019 and NNV has participated in all previous competitions. This report is a snapshot of the current landscape of tools and the types of benchmarks for which these tools are suited. Due to the diversity of problems, lack of a shared hardware platform, and the early stage of the competition, we are not ranking tools in terms of performance, yet the presented results combined with 2020 results probably provide the most complete assessment of current tools for safety verification of NNCS.more » « less
-
This report presents the results of a friendly competition for formal verification of continuous and hybrid systems with artificial intelligence (AI) components. Specifically, machine learning (ML) components in cyber-physical systems (CPS), such as feedforward neural networks used as feedback controllers in closed-loop systems, are considered, which is a class of systems classically known as intelligent control systems, or in more modern and specific terms, neural network control systems (NNCS). We broadly refer to this category as AI and NNCS (AINNCS). The friendly competition took place as part of the workshop Applied Verification for Continuous and Hybrid Systems (ARCH) in 2024. In the 8th edition of this AINNCS category at ARCH-COMP, five tools have been applied to solve 12 benchmarks, which are CORA, CROWN-Reach, GoTube, JuliaReach, and NNV. This is the year with the largest interest in the community, with two new, and three previous participants. Following last year’s trend, despite the additional challenges presented, the verification results have improved year-over-year. In terms of computation time, we can observe that the previous participants have improved as well, showing speed-ups of up to one order of magnitude, such as JuliaReach on the TORA benchmark with ReLU controller, and NNV on the TORA benchmark with both heterogeneous controllers.more » « less
-
Quantitative verification tools compute probabilities, expected rewards, or steady-state values for formal models of stochastic and timed systems. Exact results often cannot be obtained efficiently, so most tools use floating-point arithmetic in iterative algorithms that approximate the quantity of interest. Correctness is thus defined by the desired precision and determines performance. In this paper, we report on the experimental evaluation of these trade-offs performed in QComp 2020: the second friendly competition of tools for the analysis of quantitative formal models. We survey the precision guarantees—ranging from exact rational results to statistical confidence statements—offered by the nine participating tools. They gave rise to a performance evaluation using five tracks with varying correctness criteria, of which we present the results.more » « less
-
null (Ed.)Continuous invariants are an important component in deductive verification of hybrid and continuous systems. Just like discrete invariants are used to reason about correctness in discrete systems without having to unroll their loops, continuous invariants are used to reason about differential equations without having to solve them. Automatic generation of continuous invariants remains one of the biggest practical challenges to the automation of formal proofs of safety for hybrid systems. There are at present many disparate methods available for generating continuous invariants; however, this wealth of diverse techniques presents a number of challenges, with different methods having different strengths and weaknesses. To address some of these challenges, we develop Pegasus : an automatic continuous invariant generator which allows for combinations of various methods, and integrate it with the KeYmaera X theorem prover for hybrid systems. We describe some of the architectural aspects of this integration, comment on its methods and challenges, and present an experimental evaluation on a suite of benchmarks.more » « less
An official website of the United States government

