skip to main content


Title: Verisig 2.0: Verification of Neural Network Controllers Using Taylor Model Preconditioning.
This paper presents Verisig 2.0, a verification tool for closed-loop systems with neural network (NN) controllers. We focus on NNs with tanh/sigmoid activations and develop a Taylor-model-based reachability algorithm through Taylor model preconditioning and shrink wrapping. Furthermore, we provide a parallelized implementation that allows Verisig 2.0 to efficiently handle larger NNs than existing tools can. We provide an extensive evaluation over 10 benchmarks and compare Verisig 2.0 against three state-of-the-art verification tools. We show that Verisig 2.0 is both more accurate and faster, achieving speed-ups of up to 21x and 268x against different tools, respectively.  more » « less
Award ID(s):
1837210
NSF-PAR ID:
10331771
Author(s) / Creator(s):
Editor(s):
Silva, A.
Date Published:
Journal Name:
Computer Aided Verification (CAV)
Volume:
12759
Page Range / eLocation ID:
249–262
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This article addresses the problem of verifying the safety of autonomous systems with neural network (NN) controllers. We focus on NNs with sigmoid/tanh activations and use the fact that the sigmoid/tanh is the solution to a quadratic differential equation. This allows us to convert the NN into an equivalent hybrid system and cast the problem as a hybrid system verification problem, which can be solved by existing tools. Furthermore, we improve the scalability of the proposed method by approximating the sigmoid with a Taylor series with worst-case error bounds. Finally, we provide an evaluation over four benchmarks, including comparisons with alternative approaches based on mixed integer linear programming as well as on star sets. 
    more » « less
  2. Abstract

    In the last decade, much work in atmospheric science has focused on spatial verification (SV) methods for gridded prediction, which overcome serious disadvantages of pixelwise verification. However, neural networks (NN) in atmospheric science are almost always trained to optimize pixelwise loss functions, even when ultimately assessed with SV methods. This establishes a disconnect between model verification during versus after training. To address this issue, we develop spatially enhanced loss functions (SELF) and demonstrate their use for a real-world problem: predicting the occurrence of thunderstorms (henceforth, “convection”) with NNs. In each SELF we use either a neighborhood filter, which highlights convection at scales larger than a threshold, or a spectral filter (employing Fourier or wavelet decomposition), which is more flexible and highlights convection at scales between two thresholds. We use these filters to spatially enhance common verification scores, such as the Brier score. We train each NN with a different SELF and compare their performance at many scales of convection, from discrete storm cells to tropical cyclones. Among our many findings are that (i) for a low or high risk threshold, the ideal SELF focuses on small or large scales, respectively; (ii) models trained with a pixelwise loss function perform surprisingly well; and (iii) nevertheless, models trained with a spectral filter produce much better-calibrated probabilities than a pixelwise model. We provide a general guide to using SELFs, including technical challenges and the final Python code, as well as demonstrating their use for the convection problem. To our knowledge this is the most in-depth guide to SELFs in the geosciences.

    Significance Statement

    Gridded predictions, in which a quantity is predicted at every pixel in space, should be verified with spatially aware methods rather than pixel by pixel. Neural networks (NN), which are often used for gridded prediction, are trained to minimize an error value called the loss function. NN loss functions in atmospheric science are almost always pixelwise, which causes the predictions to miss rare events and contain unrealistic spatial patterns. We use spatial filters to enhance NN loss functions, and we test our novel spatially enhanced loss functions (SELF) on thunderstorm prediction. We find that different SELFs work better for different scales (i.e., different-sized thunderstorm complexes) and that spectral filters, one of the two filter types, produce unexpectedly well calibrated thunderstorm probabilities.

     
    more » « less
  3. Next-generation models of wind farm flows are increasingly needed to assist the design, operation, and performance diagnostic of modern wind power plants. Accuracy in the descriptions of the wind farm aerodynamics, including the effects of atmospheric stability, coalescing wakes, and the pressure field induced by the turbine rotors are necessary attributes for such tools as well as low computational costs. The Pseudo-2D RANS model is formulated to provide an efficient solution of the Navier–Stokes equations governing wind-farm flows installed in flat terrain and offshore. The turbulence closure and actuator disk model are calibrated based on wind light detection and ranging measurements of wind turbine wakes collected under different operative and atmospheric conditions. A shallow-water formulation is implemented to achieve a converged solution for the velocity and pressure fields across a farm with computational costs comparable to those of mid-fidelity engineering wake models. The theoretical foundations and numerical scheme of the Pseudo-2D RANS model are provided, together with a detailed description of the verification and validation processes. The model is assessed against a large dataset of power production for an onshore wind farm located in North Texas showing a normalized mean absolute error of 5.6% on the 10-min-averaged active power and 3% on the clustered wind farm efficiency, which represent 8% and 24%, respectively, improvements with respect to the best-performing engineering wake model tested in this work. 
    more » « less
  4. Reusable symbolic evaluators are a key building block of solver-aided verification and synthesis tools. A reusable evaluator reduces the semantics of all paths in a program to logical constraints, and a client tool uses these constraints to formulate a satisfiability query that is discharged with SAT or SMT solvers. The correctness of the evaluator is critical to the soundness of the tool and the domain properties it aims to guarantee. Yet so far, the trust in these evaluators has been based on an ad-hoc foundation of testing and manual reasoning. This paper presents the first formal framework for reasoning about the behavior of reusable symbolic evaluators. We develop a new symbolic semantics for these evaluators that incorporates state merging. Symbolic evaluators use state merging to avoid path explosion and generate compact encodings. To accommodate a wide range of implementations, our semantics is parameterized by a symbolic factory, which abstracts away the details of merging and creation of symbolic values. The semantics targets a rich language that extends Core Scheme with assumptions and assertions, and thus supports branching, loops, and (first-class) procedures. The semantics is designed to support reusability, by guaranteeing two key properties: legality of the generated symbolic states, and the reducibility of symbolic evaluation to concrete evaluation. Legality makes it simpler for client tools to formulate queries, and reducibility enables testing of client tools on concrete inputs. We use the Lean theorem prover to mechanize our symbolic semantics, prove that it is sound and complete with respect to the concrete semantics, and prove that it guarantees legality and reducibility. To demonstrate the generality of our semantics, we develop Leanette, a reference evaluator written in Lean, and Rosette 4, an optimized evaluator written in Racket. We prove Leanette correct with respect to the semantics, and validate Rosette 4 against Leanette via solver-aided differential testing. To demonstrate the practicality of our approach, we port 16 published verification and synthesis tools from Rosette 3 to Rosette 4. Rosette 3 is an existing reusable evaluator that implements the classic merging semantics, adopted from bounded model checking. Rosette 4 replaces the semantic core of Rosette 3 but keeps its optimized symbolic factory. Our results show that Rosette 4 matches the performance of Rosette 3 across a wide range of benchmarks, while providing a cleaner interface that simplifies the implementation of client tools. 
    more » « less
  5. Silva, Alexandra ; Leino, K. Rustan (Ed.)
    Neural Networks (NNs) have increasingly apparent safety implications commensurate with their proliferation in real-world applications: both unanticipated as well as adversarial misclassifications can result in fatal outcomes. As a consequence, techniques of formal verification have been recognized as crucial to the design and deployment of safe NNs. In this paper, we introduce a new approach to formally verify the most commonly considered safety specifications for ReLU NNs -- i.e. polytopic specifications on the input and output of the network. Like some other approaches, ours uses a relaxed convex program to mitigate the combinatorial complexity of the problem. However, unique in our approach is the way we use a convex solver not only as a linear feasibility checker, but also as a means of penalizing the amount of relaxation allowed in solutions. In particular, we encode each ReLU by means of the usual linear constraints, and combine this with a convex objective function that penalizes the discrepancy between the output of each neuron and its relaxation. This convex function is further structured to force the largest relaxations to appear closest to the input layer; this provides the further benefit that the most ``problematic'' neurons are conditioned as early as possible, when conditioning layer by layer. This paradigm can be leveraged to create a verification algorithm that is not only faster in general than competing approaches, but is also able to verify considerably more safety properties; we evaluated PEREGRiNN on a standard MNIST robustness verification suite to substantiate these claims. 
    more » « less