skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: DSGEN: concolic testing GPU implementations of concurrent dynamic data structures
Concolic testing combines concrete execution with symbolic execution along the executed path to automatically generate new test inputs that exercise program paths and deliver high code coverage during testing. The GKLEE tool uses this approach to expose data races in CUDA programs written for execution of GPGPUs. In programs employing concurrent dynamic data structures, automatic generation of data structures with appropriate shapes that cause threads to follow selected, possibly divergent, paths is a challenge. Moreover, a single non-conflicting data structure must be generated for multiple threads, that is, a single shape must be found that simultaneously causes all threads to follow their respective chosen paths. When an execution exposes a bug (e.g., a data race), the generated data structure shape helps the programmer understand the cause of the bug. Because GKLEE does not permit pointers that construct dynamic data structures to be made symbolic, it cannot automatically generate data structures of different shapes and must rely on the user to write code that constructs them to exercise desired paths. We have developed DSGEN for automatically generating non-conflicting dynamic data structures with different shapes and integrated it with GKLEE to uncover and facilitate understanding of data races in programs that employ complex concurrent dynamic data structures. In comparison to GKLEE, DSGEN increases the number of races detected from 10 to 25 by automatically generating a total of 1,897 shapes in implementations of four complex concurrent dynamic data structures -- B-Tree, Hash-Array Mapped Trie, RRB-Tree, and Skip List.  more » « less
Award ID(s):
1813173 2028714 2002554
PAR ID:
10267624
Author(s) / Creator(s):
;
Date Published:
Journal Name:
ICS '21: Proceedings of the ACM International Conference on Supercomputing
Page Range / eLocation ID:
75 to 87
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Symbolic execution is a popular software testing technique that can systematically examine program code to find bugs. Owing to the prevalence of software vulnerabilities, symbolic execution has been extensively used to detect software vulnerabilities. A major challenge of using symbolic execution to find complicated vulnerabilities such as use-after-free is to not only direct symbolic execution to explore relevant program paths but also explore the paths in a specific order. In this paper, we describe a targeted symbolic execution, called UAFDetect, for finding use-after-free vulnerabilities efficiently. UAFDetect guides symbolic execution to focus on paths that are likely to cause a use-after-free by pruning paths that are unlikely or infeasible to cause that. It uses dynamic typestate analysis to identify unlikely paths and static control flow analysis to identify infeasible paths. UAFDetect performs typestate analysis to detect the occurrence of use-after-free vulnerabilities. Upon the detection of a vulnerability, UAFDetect generates an exploit to trigger the vulnerability. We develop the prototype of UAFDetect and evaluate it on real-world use-after-free vulnerabilities. The evaluation demonstrates that UAFDetect can find use-after-free vulnerabilities effectively and efficiently. 
    more » « less
  2. null (Ed.)
    We present LLSC, a prototype compiler for nondeterministic par- allel symbolic execution of the LLVM intermediate representation (IR). Given an LLVM IR program, LLSC generates code preserving the symbolic execution semantics and orchestrating solver invo- cations. The generated code runs efficiently, since the code has eliminated the interpretation overhead and explores multiple paths in parallel. To the best of our knowledge, LLSC is the first compiler for fork-based symbolic execution semantics that can generate parallel execution code. In this demonstration paper, we present the current development and preliminary evaluation of LLSC. The principle behind LLSC is to automatically specialize a symbolic interpreter via the 1st Futamura projection, a fundamental connection between in- terpreters and compilers. The symbolic interpreter is written in an expressive high-level language equipped with a multi-stage programming facility. We demonstrate the run time performance through a set of benchmark programs, showing that LLSC outperforms interpretation-based symbolic execution engines in significant ways. 
    more » « less
  3. null (Ed.)
    Concurrent programs are notoriously hard to write correctly, as scheduling nondeterminism introduces subtle errors that are both hard to detect and to reproduce. The most common concurrency errors are (data) races, which occur when memory-conflicting actions are executed concurrently. Consequently, considerable effort has been made towards developing efficient techniques for race detection. The most common approach is dynamic race prediction: given an observed, race-free trace σ of a concurrent program, the task is to decide whether events of σ can be correctly reordered to a trace σ * that witnesses a race hidden in σ. In this work we introduce the notion of sync(hronization)-preserving races. A sync-preserving race occurs in σ when there is a witness σ * in which synchronization operations (e.g., acquisition and release of locks) appear in the same order as in σ. This is a broad definition that strictly subsumes the famous notion of happens-before races. Our main results are as follows. First, we develop a sound and complete algorithm for predicting sync-preserving races. For moderate values of parameters like the number of threads, the algorithm runs in Õ( N ) time and space, where N is the length of the trace σ. Second, we show that the problem has a Ω( N /log 2 N ) space lower bound, and thus our algorithm is essentially time and space optimal. Third, we show that predicting races with even just a single reversal of two sync operations is NP-complete and even W1-hard when parameterized by the number of threads. Thus, sync-preservation characterizes exactly the tractability boundary of race prediction, and our algorithm is nearly optimal for the tractable side. Our experiments show that our algorithm is fast in practice, while sync-preservation characterizes races often missed by state-of-the-art methods. 
    more » « less
  4. In spite of decades of research in bug detection tools, there is a surprising dearth of ground-truth corpora that can be used to evaluate the efficacy of such tools. Recently, systems such as LAVA and EvilCoder have been proposed to automatically inject bugs into software to quickly generate large bug corpora, but the bugs created so far differ from naturally occurring bugs in a number of ways. In this work, we propose a new automated bug injection system, Apocalypse, that uses formal techniques—symbolic execution, constraint-based program synthesis and model counting—to automatically inject fair (can potentially be discovered by current bug-detection tools), deep (requiring a long sequence of dependencies to be satisfied to fire), uncorrelated (each bug behaving independent of others), reproducible (a trigger input being available) and rare (can be triggered by only a few program inputs) bugs in large software code bases. In our evaluation, we inject bugs into thirty Coreutils programs as well as the TCAS test suite. We find that bugs synthesized by Apocalypse are highly realistic under a variety of metrics, that they do not favor a particular bug-finding strategy (unlike bugs produced by LAVA), and that they are more difficult to find than manually injected bugs, requiring up around 240× more tests to discover with a state-of-the-art symbolic execution tool. 
    more » « less
  5. Symbolic execution is a powerful program analysis and testing technique. Symbolic execution engines are usually implemented as interpreters, and the induced interpretation over-head can dramatically inhibit performance. Alternatively, implementation choices based on instrumentation provide a limited ability to transform programs. However, the use of compilation and code generation techniques beyond simple instrumentation remains underexplored for engine construction, leaving potential performance gains untapped. In this paper, we show how to tap some of these gains using sophisticated compilation techniques: We present Gensym, an optimizing symbolic-execution compiler that generates symbolic code which explores paths and generates tests in parallel. The key insight of GensYmis to compile symbolic execution tasks into cooperative concurrency via continuation-passing style, which further enables efficient parallelism. The design and implementation of Gensym is based on partial evaluation and generative programming techniques, which make it high-level and performant at the same time. We compare the performance of Gensym against the prior symbolic-execution compiler LLSC and the state-of-the-art symbolic interpreter KLEE. The results show an average 4.6× speedup for sequential execution and 9.4× speedup for parallel execution on 20 benchmark programs. 
    more » « less