skip to main content

Title: Breaking Through Binaries: Compiler-quality Instrumentation for Better Binary-only Fuzzing
Coverage-guided fuzzing is one of the most effective software security testing techniques. Fuzzing takes on one of two forms: compiler-based or binary-only, depending on the availability of source code. While the fuzzing community has improved compiler-based fuzzing with performance- and feedback-enhancing program transformations, binary-only fuzzing lags behind due to the semantic and performance limitations of instrumenting code at the binary level. Many fuzzing use cases are binary-only (i.e., closed source). Thus, applying fuzzing-enhancing program transformations to binary-only fuzzing—without sacrificing performance—remains a compelling challenge. This paper examines the properties required to achieve compiler-quality binary-only fuzzing instrumentation. Based on our findings, we design ZAFL: a platform for applying fuzzing-enhancing program transformations to binary-only targets—maintaining compiler-level performance. We showcase ZAFL's capabilities in an implementation for the popular fuzzer AFL, including five compiler-style fuzzing-enhancing transformations, and evaluate it against the leading binary-only fuzzing instrumenters AFL-QEMU and AFL-Dyninst. Across LAVA-M and real-world targets, ZAFL improves crash-finding by 26–96% and 37–131%; and throughput by 48– 78% and 159–203% compared to AFL-Dyninst and AFL-QEMU, respectively—while maintaining compiler-level of overhead of 27%. We also show that ZAFL supports real-world open- and closed-source software of varying size (10K– 100MB), complexity (100–1M basic blocks), platform (Linux and Windows), and format more » (e.g., stripped and PIC). « less
; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
30th USENIX Security Symposium
Sponsoring Org:
National Science Foundation
More Like this
  1. Coverage-guided fuzzing's aggressive, high-volume testing has helped reveal tens of thousands of software security flaws. While executing billions of test cases mandates fast code coverage tracing, the nature of binary-only targets leads to reduced tracing performance. A recent advancement in binary fuzzing performance is Coverage-guided Tracing (CGT), which brings orders-of-magnitude gains in throughput by restricting the expense of coverage tracing to only when new coverage is guaranteed. Unfortunately, CGT suits only a basic block coverage granularity---yet most fuzzers require finer-grain coverage metrics: edge coverage and hit counts. It is this limitation which prohibits nearly all of today's state-of-the-art fuzzers from attaining the performance benefits of CGT.This paper tackles the challenges of adapting CGT to fuzzing's most ubiquitous coverage metrics. We introduce and implement a suite of enhancements that expand CGT's introspection to fuzzing's most common code coverage metrics, while maintaining its orders-of-magnitude speedup over conventional always-on coverage tracing. We evaluate their trade-offs with respect to fuzzing performance and effectiveness across 12 diverse real-world binaries (8 open- and 4 closed-source). On average, our coverage-preserving CGT attains near-identical speed to the present block-coverage-only CGT, UnTracer; and outperforms leading binary- and source-level coverage tracers QEMU, Dyninst, RetroWrite, and AFL-Clang by 2--24x, finding moremore »bugs in less time.« less
  2. As big data analytics become increasingly popular, data-intensive scalable computing (DISC) systems help address the scalability issue of handling large data. However, automated testing for such data-centric applications is challenging, because data is often incomplete, continuously evolving, and hard to know a priori. Fuzz testing has been proven to be highly effective in other domains such as security; however, it is nontrivial to apply such traditional fuzzing to big data analytics directly for three reasons: (1) the long latency of DISC systems prohibits the applicability of fuzzing: naïve fuzzing would spend 98% of the time in setting up a test environment; (2) conventional branch coverage is unlikely to scale to DISC applications because most binary code comes from the framework implementation such as Apache Spark; and (3) random bit or byte level mutations can hardly generate meaningful data, which fails to reveal real-world application bugs. We propose a novel coverage-guided fuzz testing tool for big data analytics, called BigFuzz. The key essence of our approach is that: (a) we focus on exercising application logic as opposed to increasing framework code coverage by abstracting the DISC framework using specifications. BigFuzz performs automated source to source transformations to construct an equivalent DISCmore »application suitable for fast test generation, and (b) we design schema-aware data mutation operators based on our in-depth study of DISC application error types. BigFuzz speeds up the fuzzing time by 78 to 1477X compared to random fuzzing, improves application code coverage by 20% to 271%, and achieves 33% to 157% improvement in detecting application errors. When compared to the state of the art that uses symbolic execution to test big data analytics, BigFuzz is applicable to twice more programs and can find 81% more bugs.« less
  3. iOS is one of the most valuable targets for security researchers. Unfortunately, studying the internals of this operating system is notoriously hard, due to the closed nature of the iOS ecosystem and the absence of easily-accessible analysis tools. To address this issue, we developed TruEMU, which we present in this talk. TruEMU is the first open-source, extensible, whole-system iOS emulator. Compared to the few available alternatives, TruEMU enables complete iOS kernel emulation, including emulation of the SecureROM and the USB kernel stack. More importantly, TruEMU is completely free and open-source, and it is based on the well-known and highly extensible emulator QEMU. This talk will start by presenting the challenges and the solutions we devised to reverse engineer current iOS boot code and kernel code, and explain how to provide adequate support in QEMU. Then, to showcase TruEMU's usefulness and capabilities, we will demonstrate how it can completely boot modern iOS images, including iOS 14 and the latest iOS 15, and how it can properly run different user-space components, such as launchd, restored, etc. Later, we will showcase two promising ways to use TruEMU as an iOS vulnerability research platform. Specifically, we will demonstrate how to use TruEMU to enablemore »coverage-based fuzzing of the iOS kernel USB stack. Further, we will show how TruEMU provides a platform to implement coverage-based, syscall-level fuzzing. This platform enables security researchers to automatically explore multiple attack surfaces of iOS. In sum, building a complete emulator for iOS is a daunting task. Many features (i.e., many peripherals) still need to be implemented to allow a complete emulation of a modern iOS device. We hope this talk will also bootstrap a large community involvement in this project that will progressively shed more light on the obscure corners of iOS security.« less
  4. Binary analysis of closed-source, low-level, and embedded systems software has emerged at the heart of cyberphysical vulnerability assessment of third-party or legacy devices in safety-critical systems. In particular, recovering the semantics of the source algorithmic implementations enables analysts to understand the context of a particular binary program snippet. However, experimentation and evaluation of binary analysis techniques on real-world embedded cyber-physical systems are limited to domain-specific testbeds with a low number of use cases–insufficient to support emerging data-driven techniques. Moreover, the use cases rarely have the source mathematical expressions, algorithms, and compiled binaries. In this paper, we present AUTOCPS, a framework for generating a large corpus of control systems binaries along with their source algorithmic expressions and source code. AUTOCPS enables researchers to tune the control system binary data generation by varying different permutations of cyber-physical modules, e.g., the underlying control algorithm, while ensuring a semantically valid binary. We initially constrain AUTOCPS to the flight software domain and generate over 4000 semantically different control systems source representations, which are then used to generate hundreds of thousands of binaries. We describe current and future use cases of AUTOCPS towards cyber-physical vulnerability assessment of safety-critical systems.
  5. While many real-world programs are shipped with configurations to enable/disable functionalities, fuzzers have mostly been applied to test single configurations of these programs. In this work, we first conduct an empirical study to understand how program configurations affect fuzzing performance. We find that limiting a campaign to a single configuration can result in failing to cover a significant amount of code. We also observe that different program configurations contribute differing amounts of code coverage, challenging the idea that each one can be efficiently fuzzed individually. Motivated by these two observations, we propose ConfigFuzz , which can fuzz configurations along with normal inputs. ConfigFuzz transforms the target program to encode its program options within part of the fuzzable input, so existing fuzzers’ mutation operators can be reused to fuzz program configurations. We instantiate ConfigFuzz on six configurable, common fuzzing targets, and integrate their executions in FuzzBench. In our evaluation, ConfigFuzz outperforms two baseline fuzzers in four targets, while the results are mixed in the other targets due to program size and configuration space. We also analyze the options fuzzed by ConfigFuzz and how they affect the performance.