Graph analytics codes are widely used and tend to exhibit input-dependent behavior, making them particularly interesting for software verification and validation. This paper presents Indigo3, a labeled benchmark suite based on 7 graph algorithms that are implemented in different styles, including versions with deliberately planted bugs. We systematically combine 13 sets of implementation styles and 15 common bug types to create the 41,790 CUDA, OpenMP, and parallel C programs in the suite. Each code is labeled with the styles and bugs it incorporates. We used 4 subsets of Indigo3 to test 5 program-verification tools. Our results show that the tools perform quite differently across the bug types and implementation styles, have distinct strengths and weaknesses, and generally struggle with graph codes. We discuss the styles and bugs that tend to be the most challenging as well as the programming patterns that yield false positives.
more »
« less
The Indigo Program-Verification Microbenchmark Suite of Irregular Parallel Code Patterns
Irregular programs are found in many domains and tend to exhibit input-dependent control flow and memory accesses. This paper introduces the Indigo suite of important irregular parallel code patterns for testing verification and other tools. We studied many irregular CPU and GPU programs and extracted the key code patterns. Then, we methodically built variations of these patterns to alter the control-flow and memory-access behavior and/or introduce bugs, yielding the thousands of OpenMP and CUDA microbenchmarks in the suite. Indigo includes a set of generators to systematically create an unbounded number of inputs for each microbenchmark, which is essential to exercise the wide range of possible behaviors of input-dependent codes. To manage the millions of code and input combinations, Indigo provides the flexibility to generate user-defined subsets of the suite. Experiments with a subset of buggy and bug-free codes illustrate that irregular programs pose a significant challenge to both static and dynamic program verification tools. Moreover, such tools can perform quite differently across code patterns that contain the same bug.
more »
« less
- Award ID(s):
- 1955367
- PAR ID:
- 10343589
- Date Published:
- Journal Name:
- 2022 IEEE International Symposium on Performance Analysis of Systems and Software
- Page Range / eLocation ID:
- 24 to 34
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Intermittently-powered, energy-harvesting devices operate on energy collected from their environment and must operate intermittently as energy is available. Runtime systems for such devices often rely on checkpoints or redo-logs to save execution state between power cycles, causing arbitrary code regions to re-execute on reboot. Anynon-idempotentprogram behavior—behavior that can change on each execution—can lead to incorrect results. This work investigates non-idempotent behavior caused by repeating I/O operations, not addressed by prior work. If such operations affect a control statement or address of a memory update, they can cause programs to take different paths or write to different memory locations on re-executions, resulting in inconsistent memory states. We provide the first characterization of input-dependent idempotence bugs and develop IBIS-S, a program analysis tool for detecting such bugs at compile time, and IBIS-D, a dynamic information flow tracker to detect bugs at runtime. These tools use taint propagation to determine the reach of input. IBIS-S searches for code patterns leading to inconsistent memory updates, while IBIS-D detects concrete memory inconsistencies. We evaluate IBIS on embedded system drivers and applications. IBIS can detect I/O-dependent idempotence bugs, giving few (IBIS-S) or no (IBIS-D) false positives and providing actionable bug reports. These bugs are common in sensor-driven applications and are not fixed by existing intermittent systems.more » « less
-
Many popular vetting tools for Android applications use static code analysis techniques. In particular, Inter-procedural Data-Flow Graph (IDFG) construction is the computation at the core of Android static data-flow analysis and consumes most of the analysis time. Many analysis tools use a worklist algorithm, an iterative fixed-point approach, to construct the IDFG. In this paper, we observe that a straightforward GPU parallelization of the worklist algorithm leads to significant underutilization of the GPU resources. We identify four performance bottlenecks, namely, frequent dynamic memory allocations, high branch divergence, workload imbalance, and irregular memory access patterns. Accordingly, we propose GDroid, a GPU-based worklist algorithm implementation with multiple fine-grained optimizations tailored to common characteristics of Android applications. The optimizations considered are: matrix-based data structure, memory access-based node grouping, and worklist merging. Our experimental evaluation, performed on 1000 Android applications, shows that the proposed optimizations are beneficial to performance, and GDroid can achieve up to 128X speedups against a plain GPU implementation.more » « less
-
In spite of decades of research in bug detection tools, there is a surprising dearth of ground-truth corpora that can be used to evaluate the efficacy of such tools. Recently, systems such as LAVA and EvilCoder have been proposed to automatically inject bugs into software to quickly generate large bug corpora, but the bugs created so far differ from naturally occurring bugs in a number of ways. In this work, we propose a new automated bug injection system, Apocalypse, that uses formal techniques—symbolic execution, constraint-based program synthesis and model counting—to automatically inject fair (can potentially be discovered by current bug-detection tools), deep (requiring a long sequence of dependencies to be satisfied to fire), uncorrelated (each bug behaving independent of others), reproducible (a trigger input being available) and rare (can be triggered by only a few program inputs) bugs in large software code bases. In our evaluation, we inject bugs into thirty Coreutils programs as well as the TCAS test suite. We find that bugs synthesized by Apocalypse are highly realistic under a variety of metrics, that they do not favor a particular bug-finding strategy (unlike bugs produced by LAVA), and that they are more difficult to find than manually injected bugs, requiring up around 240× more tests to discover with a state-of-the-art symbolic execution tool.more » « less
-
Compiler fuzzing tools such as Csmith have uncovered many bugs in compilers by randomly sampling programs from a generative model. The success of these tools is often attributed to their ability to generate unexpected corner case inputs that developers tend to overlook during manual testing. At the same time, their chaotic nature makes fuzzer-generated test cases notoriously hard to interpret, which has lead to the creation of input simplification tools such as C-Reduce (for C compiler bugs). In until now unrelated work, researchers have also shown that human-written software tends to be rather repetitive and predictable to language models. Studies show that developers deliberately write more predictable code, whereas code with bugs is relatively unpredictable. In this study, we ask the natural questions of whether this high predictability property of code also, and perhaps counter-intuitively, applies to fuzzer-generated code. That is, we investigate whether fuzzer-generated compiler inputs are deemed unpredictable by a language model built on human-written code and surprisingly conclude that it is not. To the contrary, Csmith fuzzer-generated programs are more predictable on a per-token basis than human-written C programs. Furthermore, bug-triggering tended to be more predictable still than random inputs, and the C-Reduce minimization tool did not substantially increase this predictability. Rather, we find that bug-triggering inputs are unpredictable relative to Csmith's own generative model. This is encouraging; our results suggest promising research directions on incorporating predictability metrics in the fuzzing and reduction tools themselves.more » « less
An official website of the United States government

