While many real-world programs are shipped with configurations to enable/disable functionalities, fuzzers have mostly been applied to test single configurations of these programs. In this work, we first conduct an empirical study to understand how program configurations affect fuzzing performance. We find that limiting a campaign to a single configuration can result in failing to cover a significant amount of code. We also observe that different program configurations contribute differing amounts of code coverage, challenging the idea that each one can be efficiently fuzzed individually. Motivated by these two observations, we propose ConfigFuzz , which can fuzz configurations along with normal inputs. ConfigFuzz transforms the target program to encode its program options within part of the fuzzable input, so existing fuzzers’ mutation operators can be reused to fuzz program configurations. We instantiate ConfigFuzz on six configurable, common fuzzing targets, and integrate their executions in FuzzBench. In our evaluation, ConfigFuzz outperforms two baseline fuzzers in four targets, while the results are mixed in the other targets due to program size and configuration space. We also analyze the options fuzzed by ConfigFuzz and how they affect the performance.
One Fuzzing Strategy to Rule Them All
Coverage-guided fuzzing has become mainstream in fuzzing to automatically expose program vulnerabilities. Recently, a group of fuzzers are proposed to adopt a random search mechanism namely Havoc, explicitly or implicitly, to augment their edge exploration. However, they only tend to adopt the default setup of Havoc as an implementation option while none of them attempts to explore its power under diverse setups or inspect its rationale for potential improvement. In this paper, to address such issues, we conduct the first empirical study on Havoc to enhance the understanding of its characteristics. Specifically, we first find that applying the default setup of Havoc to fuzzers can significantly improve their edge coverage performance. Interestingly, we further observe that even simply executing Havoc itself without appending it to any fuzzer can lead to strong edge coverage performance and outperform most of our studied fuzzers. Moreover, we also extend the execution time of Havoc and find that most fuzzers can not only achieve significantly higher edge coverage, but also tend to perform similarly (i.e., their performance gaps get largely bridged). Inspired by the findings, we further propose Havoc𝑀𝐴𝐵, which models the Havoc mutation strategy as a multi-armed bandit problem to be solved by dynamically adjusting more »
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Proceedings of the International Conference on Software Engineering
- Sponsoring Org:
- National Science Foundation
More Like this
Same Coverage, Less Bloat: Accelerating Binary-only Fuzzing with Coverage-preserving Coverage-guided TracingCoverage-guided fuzzing's aggressive, high-volume testing has helped reveal tens of thousands of software security flaws. While executing billions of test cases mandates fast code coverage tracing, the nature of binary-only targets leads to reduced tracing performance. A recent advancement in binary fuzzing performance is Coverage-guided Tracing (CGT), which brings orders-of-magnitude gains in throughput by restricting the expense of coverage tracing to only when new coverage is guaranteed. Unfortunately, CGT suits only a basic block coverage granularity---yet most fuzzers require finer-grain coverage metrics: edge coverage and hit counts. It is this limitation which prohibits nearly all of today's state-of-the-art fuzzers from attaining the performance benefits of CGT.This paper tackles the challenges of adapting CGT to fuzzing's most ubiquitous coverage metrics. We introduce and implement a suite of enhancements that expand CGT's introspection to fuzzing's most common code coverage metrics, while maintaining its orders-of-magnitude speedup over conventional always-on coverage tracing. We evaluate their trade-offs with respect to fuzzing performance and effectiveness across 12 diverse real-world binaries (8 open- and 4 closed-source). On average, our coverage-preserving CGT attains near-identical speed to the present block-coverage-only CGT, UnTracer; and outperforms leading binary- and source-level coverage tracers QEMU, Dyninst, RetroWrite, and AFL-Clang by 2--24x, finding moremore »
Greybox fuzzing has made impressive progress in recent years, evolving from heuristics-based random mutation to solving individual branch constraints. However, they have difficulty solving path constraints that involve deeply nested conditional statements, which are common in image and video decoders, network packet analyzers, and checksum tools. We propose an approach for addressing this problem. First, we identify all the control flow-dependent conditional statements of the target conditional statement. Next, we select the taint flow-dependent conditional statements. Finally, we use three strategies to find an input that satisfies all conditional statements simultaneously. We implemented this approach in a tool called Matryoshka and compared its effectiveness on 13 open source programs with other state-of-the-art fuzzers. Matryoshka achieved significantly higher cumulative line and branch coverage than AFL, QSYM, and Angora. We manually classified the crashes found by Matryoshka into 41 unique new bugs and obtained 12 CVEs. Our evaluation demonstrates the key technique contributing to Matryoshka's impressive performance: among the nesting constraints of a target conditional statement, Matryoshka collects only those that may cause the target unreachable, which greatly simplifies the path constraint that it has to solve.
Fuzz testing, or fuzzing, has become one of the de facto standard techniques for bug finding in the software industry. In general, fuzzing provides various inputs to the target program with the goal of discovering unhandled exceptions and crashes. In business sectors where the time budget is limited, software vendors often launch many fuzzing instances in parallel as a common means of increasing code coverage. However, most of the popular fuzzing tools — in their parallel mode — naively run multiple instances concurrently, without elaborate distribution of workload. This can lead different instances to explore overlapped code regions, eventually reducing the benefits of concurrency. In this paper, we propose a general model to describe parallel fuzzing. This model distributes mutually-exclusive but similarly-weighted tasks to different instances, facilitating concurrency and also fairness across instances. Following this model, we develop a solution, called AFL-EDGE, to improve the parallel mode of AFL, considering a round of mutations to a unique seed as a task and adopting edge coverage to define the uniqueness of a seed. We have implemented AFL-EDGE on top of AFL and evaluated the implementation with AFL on 9 widely used benchmark programs. It shows that AFL-EDGE can benefit the edgemore »
Fuzz testing has been gaining ground recently with substantial efforts devoted to the area. Typically, fuzzers take a set of seed inputs and leverage random mutations to continually improve the inputs with respect to a cost, e.g. program code coverage, to discover vulnerabilities or bugs. Following this methodology, fuzzers are very good at generating unstructured inputs that achieve high coverage. However fuzzers are less effective when the inputs are structured, say they conform to an input grammar. Due to the nature of random mutations, the overwhelming abundance of inputs generated by this common fuzzing practice often adversely hinders the effectiveness and efficiency of fuzzers on grammar-aware applications. The problem of testing becomes even harder, when the goal is not only to achieve increased code coverage, but also to nd complex vulnerabilities related to other cost measures, say high resource consumption in an application. We propose Saffron an adaptive grammar-based fuzzing approach to effectively and efficiently generate inputs that expose expensive executions in programs. Saffron takes as input a user-provided grammar, which describes the input space of the program under analysis, and uses it to generate test inputs. Saffron assumes that the grammar description is approximate since precisely describing the inputmore »