skip to main content

Title: Swarm Model Checking on the {GPU}
We present Grapple, a new and powerful framework for explicit-state model checking on GPUs. Grapple is based on swarm verification (SV), a model-checking technique wherein a collection or swarm of small, memory- and time-bounded verification tests (VTs) are run in parallel to perform state-space exploration. SV achieves high state-space coverage via diversification of the search strategies used by constituent VTs. Grapple represents a swarm implementation for the GPU. In particular, it runs a parallel swarm of internally-parallel VTs, which are implemented in a manner that specifically targets the GPU architecture and the SIMD parallelism its computing cores offer. Grapple also makes effective use of the GPU shared memory, eliminating costly inter-block communication overhead. We conducted a comprehensive performance analysis of Grapple focused on the various design parameters, including the size of the queue structure, implementation of guard statements, and nondeterministic exploration order. Tests are run with multiple hardware configurations, including on the Amazon cloud. Our results show that Grapple performs favorably compared to the SPIN swarm and a prior non-swarm GPU implementation. Although a recently debuted FPGA swarm is faster, the deployment process to the FPGA is much more complex than Grapple's.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
International journal on software tools for technology transfer
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Finkbeiner, B. ; Wies, T. (Ed.)
    Stochastic model checking (SMC) is a formal verification technique for the analysis of systems with probabilistic behavior. Scalability has been a major limiting factor for SMC tools to analyze real-world systems with large or infinite state spaces. The infinite-state Continuous-time Markov Chain (CTMC) model checker, STAMINA, tackles this problem by selectively exploring only a portion of a model’s state space, where a majority of the probability mass resides, to efficiently give an accurate probability bound to properties under verification. In this paper, we present two major improvements to STAMINA, namely, a method of calculating and distributing estimated state reachability probabilities that improves state space truncation efficiency and combination of the previous two CTMC analyses into one for generating the probability bound. Demonstration of the improvements on several benchmark examples, including hazard analysis of infinite-state combinational genetic circuits, yield significant savings in both run-time and state space size (and hence memory), compared to both the previous version of STAMINA and the infinite-state CTMC model checker INFAMY. The improved STAMINA demonstrates significant scalability to allow for the verification of complex real-world infinite-state systems. 
    more » « less
  2. Random Forests (RFs) are a commonly used machine learning method for classification and regression tasks spanning a variety of application domains, including bioinformatics, business analytics, and software optimization. While prior work has focused primarily on improving performance of the training of RFs, many applications, such as malware identification, cancer prediction, and banking fraud detection, require fast RF classification. In this work, we accelerate RF classification on GPU and FPGA. In order to provide efficient support for large datasets, we propose a hierarchical memory layout suitable to the GPU/FPGA memory hierarchy. We design three RF classification code variants based on that layout, and we investigate GPU- and FPGA-specific considerations for these kernels. Our experimental evaluation, performed on an Nvidia Xp GPU and on a Xilinx Alveo U250 FPGA accelerator card using publicly available datasets on the scale of millions of samples and tens of features, covers various aspects. First, we evaluate the performance benefits of our hierarchical data structure over the standard compressed sparse row (CSR) format. Second, we compare our GPU implementation with cuML, a machine learning library targeting Nvidia GPUs. Third, we explore the performance/accuracy tradeoff resulting from the use of different tree depths in the RF. Finally, we perform a comparative performance analysis of our GPU and FPGA implementations. Our evaluation shows that, while reporting the best performance on GPU, our code variants outperform the CSR baseline both on GPU and FPGA. For high accuracy targets, our GPU implementation yields a 5-9 × speedup over CSR, and up to a 2 × speedup over Nvidia’s cuML library. 
    more » « less
  3. Jansen, N ; Tribastone, M (Ed.)
    Improving the scalability of probabilistic model checking (PMC) tools is crucial to the verification of real-world system designs. The STAMINA infinite-state PMC tool achieves scalability by iteratively constructing a partial state space for an unbounded continuous-time Markov chain model, where a majority of the probability mass resides. It then performs time-bounded transient PMC. It can efficiently produce an accurate probability bound to the property under verification. We present a new software architecture design and the C++ implementation of the STAMINA 2.0 algorithm, integrated with the STORM model checker. This open-source STAMINA implementation offers a high degree of modularity and provides significant optimizations to the STAMINA 2.0 algorithm. Performance improvements are demonstrated on multiple challenging benchmark examples, including hazard analysis of infinite-state combinational genetic circuits, over the previous STAMINA implementation. Additionally, its design allows for future customizations and optimizations to the STAMINA algorithm. 
    more » « less
  4. null (Ed.)
    Sequential consistency (SC) is the most intuitive memory consistency model and the easiest for programmers and hardware designers to reason about. However, the strict memory ordering restrictions imposed by SC make it less attractive from a performance standpoint. Additionally, prior high-performance SC implementations required complex hardware structures to support speculation and recovery. In this article, we introduce the lockstep SC consistency model (LSC), a new memory model based on SC but carefully defined to accommodate the data parallel lockstep execution paradigm of GPUs. We also describe an efficient LSC implementation for an APU system-on-chip (SoC) and show that our implementation performs close to the baseline relaxed model. Evaluation of our implementation shows that the geometric mean performance cost for lockstep SC is just 0.76% for GPU execution and 6.11% for the entire APU SoC compared to a baseline with a weaker memory consistency model. Adoption of LSC in future APU and SoC designs will reduce the burden on programmers trying to write correct parallel programs, while also simplifying the implementation and verification of systems with heterogeneous processing elements and complex memory hierarchies. 1 
    more » « less
  5. Developing and maintaining a file system is time-consuming, typically requiring years of effort. Developers often test compliance with APIs such as POSIX with hand-written regression suites that, alas, examine only a fraction of a file system's state space. Conversely, formal model checking can explore vast state spaces efficiently, increasing confidence in the file system's implementation. Yet model checking is not currently part of file system development. Our position is that file systems should be designed a priori to facilitate model checking. To this end, we introduce MCFS, an architecture for efficient and comprehensive file-system model checking. MCFS relies on two new APIs that save and restore a file system's in-memory and on-disk state. We describe our earlier attempts at model-checking file systems, including unsuccessful or inefficient ones. Those attempts led us to develop VeriFS, which implements the new APIs. We illustrate MCFS's model-checking principles with VeriFS, a FUSE-based file system we were able to quickly develop with MCFS's help. 
    more » « less