NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The Havoc Paradox in Generator-Based Fuzzing (Registered Report)

https://doi.org/10.1145/3678722.3685529

Li, Ao; Huang, Madonna; Lemieux, Caroline; Padhye, Rohan (September 2024, ACM)

Full Text Available
ExChain: exception dependency analysis for root cause diagnosis

Li, Ao; Lu, Shan; Nath, Suman; Padhye, Rohan; Sekar, Vyas (August 2024, Proceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI 2024))

Full Text Available
ExChain: Exception Dependency Analysis for Root Cause Diagnosis

Li, Ao; Lu, Shan; Nath; Suman; Padhye, Rohan; Sekar, Vyas (March 2024, USENIX)

Full Text Available
Guiding Greybox Fuzzing with Mutation Testing

https://doi.org/10.1145/3597926.3598107

Vikram, Vasudev; Laybourn, Isabella; Li, Ao; Nair, Nicole; OBrien, Kelton; Sanna, Rafaello; Padhye, Rohan (July 2023, ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis)

Greybox fuzzing and mutation testing are two popular but mostly independent fields of software testing research that have so far had limited overlap. Greybox fuzzing, generally geared towards searching for new bugs, predominantly uses code coverage for selecting inputs to save. Mutation testing is primarily used as a stronger alternative to code coverage in assessing the quality of regression tests; the idea is to evaluate tests for their ability to identify artificially injected faults in the target program. But what if we wanted to use greybox fuzzing to synthesize high-quality regression tests? In this paper, we develop and evaluate Mu2, a Java-based framework for incorporating mutation analysis in the greybox fuzzing loop, with the goal of producing a test-input corpus with a high mutation score. Mu2 makes use of a differential oracle for identifying inputs that exercise interesting program behavior without causing crashes. This paper describes several dynamic optimizations implemented in Mu2 to overcome the high cost of performing mutation analysis with every fuzzer-generated input. These optimizations introduce trade-offs in fuzzing throughput and mutation killing ability, which we evaluate empirically on five real-world Java benchmarks. Overall, variants of Mu2 are able to synthesize test-input corpora with a higher mutation score than state-of-the-art Java fuzzer Zest.
more » « less
Full Text Available
Fuzzing, Symbolic Execution, and Expert Guidance for Better Testing

https://doi.org/10.1109/MS.2023.3237981

Kadron, Ismet Burak; Noller, Yannic; Padhye, Rohan; Bultan, Tevfik; Pasareanu, Corina S.; Sen, Koushik (January 2023, IEEE Software)

Full Text Available
On the Naturalness of Fuzzer-Generated Code

https://doi.org/10.1145/3524842.3527972

Kambhamettu, Rajeswari Hita; Billos, John; Oluwaseun-Apo, Tomi; Gafford, Benjamin; Padhye, Rohan; Hellendoorn, Vincent J. (May 2022, 19th International Conference on Mining Software Repositories)

Compiler fuzzing tools such as Csmith have uncovered many bugs in compilers by randomly sampling programs from a generative model. The success of these tools is often attributed to their ability to generate unexpected corner case inputs that developers tend to overlook during manual testing. At the same time, their chaotic nature makes fuzzer-generated test cases notoriously hard to interpret, which has lead to the creation of input simplification tools such as C-Reduce (for C compiler bugs). In until now unrelated work, researchers have also shown that human-written software tends to be rather repetitive and predictable to language models. Studies show that developers deliberately write more predictable code, whereas code with bugs is relatively unpredictable. In this study, we ask the natural questions of whether this high predictability property of code also, and perhaps counter-intuitively, applies to fuzzer-generated code. That is, we investigate whether fuzzer-generated compiler inputs are deemed unpredictable by a language model built on human-written code and surprisingly conclude that it is not. To the contrary, Csmith fuzzer-generated programs are more predictable on a per-token basis than human-written C programs. Furthermore, bug-triggering tended to be more predictable still than random inputs, and the C-Reduce minimization tool did not substantially increase this predictability. Rather, we find that bug-triggering inputs are unpredictable relative to Csmith's own generative model. This is encouraging; our results suggest promising research directions on incorporating predictability metrics in the fuzzing and reduction tools themselves.
more » « less
Full Text Available
Growing A Test Corpus with Bonsai Fuzzing

https://doi.org/10.1109/ICSE43902.2021.00072

Vikram, Vasudev; Padhye, Rohan; Sen, Koushik (May 2021, 43rd IEEE/ACM International Conference on Software Engineering)
null (Ed.)
Full Text Available
Efficient Fuzz Testing for Apache Spark Using Framework Abstraction

https://doi.org/10.1109/ICSE-Companion52605.2021.00036

Zhang, Qian; Wang, Jiyuan; Gulzar, Muhammad Ali; Padhye, Rohan; Kim, Miryung (May 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion))
null (Ed.)
Full Text Available
BigFuzz: efficient fuzz testing for data analytics using framework abstraction

https://doi.org/10.1145/3324884.3416641

Zhang, Qian; Wang, Jiyuan; Gulzar, Muhammad Ali; Padhye, Rohan; Kim, Miryung (December 2020, The 35th IEEE/ACM International Conference on Automated Software Engineering)
null (Ed.)
As big data analytics become increasingly popular, data-intensive scalable computing (DISC) systems help address the scalability issue of handling large data. However, automated testing for such data-centric applications is challenging, because data is often incomplete, continuously evolving, and hard to know a priori. Fuzz testing has been proven to be highly effective in other domains such as security; however, it is nontrivial to apply such traditional fuzzing to big data analytics directly for three reasons: (1) the long latency of DISC systems prohibits the applicability of fuzzing: naïve fuzzing would spend 98% of the time in setting up a test environment; (2) conventional branch coverage is unlikely to scale to DISC applications because most binary code comes from the framework implementation such as Apache Spark; and (3) random bit or byte level mutations can hardly generate meaningful data, which fails to reveal real-world application bugs. We propose a novel coverage-guided fuzz testing tool for big data analytics, called BigFuzz. The key essence of our approach is that: (a) we focus on exercising application logic as opposed to increasing framework code coverage by abstracting the DISC framework using specifications. BigFuzz performs automated source to source transformations to construct an equivalent DISC application suitable for fast test generation, and (b) we design schema-aware data mutation operators based on our in-depth study of DISC application error types. BigFuzz speeds up the fuzzing time by 78 to 1477X compared to random fuzzing, improves application code coverage by 20% to 271%, and achieves 33% to 157% improvement in detecting application errors. When compared to the state of the art that uses symbolic execution to test big data analytics, BigFuzz is applicable to twice more programs and can find 81% more bugs.
more » « less
Full Text Available
FuzzFactory: domain-specific fuzzing with waypoints

https://doi.org/10.1145/3360600

Padhye, Rohan; Lemieux, Caroline; Sen, Koushik; Simon, Laurent; Vijayakumar, Hayawardh (October 2019, Proceedings of the ACM on Programming Languages)

Full Text Available

« Prev Next »

Search for: All records