NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Fray: An Efficient General-Purpose Concurrency Testing Platform for the JVM

https://doi.org/10.1145/3764119

Li, Ao; Kang, Byeongjee; Vikram, Vasudev; Laybourn, Isabella; Dharanikota, Samvid; Tiwari, Shrey; Padhye, Rohan (October 2025, Proceedings of the ACM on Programming Languages)

Concurrency bugs are hard to discover and reproduce, even in well-synchronized programs that are free of data races. Thankfully, prior work on controlled concurrency testing (CCT) has developed sophisticated algorithms---such as partial-order based and selectively uniform sampling---to effectively search over the space of thread interleavings. Unfortunately, in practice, these techniques cannot easily be applied to real-world Java programs due to the difficulties of controlling concurrency in the presence of the managed runtime and complex synchronization primitives. So, mature Java projects that make heavy use of concurrency still rely on naive repeated stress testing in a loop. In this paper, we take a first-principles approach for elucidating the requirements and design space to enable CCT on arbitrary real-world JVM applications. We identify practical challenges with classical design choices described in prior work---such as concurrency mocking, VM hacking, and OS-level scheduling---that affect bug-finding effectiveness and/or the scope of target applications that can be easily supported. Based on these insights, we present Fray, a new platform for performing push-button concurrency testing (beyond data races) of JVM programs. The key design principle behind Fray is to orchestrate thread interleavings without replacing existing concurrency primitives, using a concurrency control mechanism called shadow locking for faithfully expressing the set of all possible program behaviors. With full concurrency control, Fray can test applications using a number of search algorithms from a simple random walk to sophisticated techniques like PCT, POS, and SURW. In an empirical evaluation on 53 benchmark programs with known bugs (SCTBench and JaConTeBe), Fray with random walk finds 70% more bugs than JPF and 77% more bugs than RR's chaos mode. We also demonstrate Fray's push-button applicability on 2,664 tests from Apache Kafka, Lucene, and Google Guava. In these mature projects, Fray successfully discovered 18 real-world concurrency bugs that can cause 371 of the existing tests to fail under specific interleavings. We believe that Fray serves as a bridge between classical academic research and industrial practice--- empowering developers with advanced concurrency testing algorithms that demonstrably uncover more bugs, while simultaneously providing researchers a platform for large-scale evaluation of search techniques.
more » « less
Free, publicly-accessible full text available October 9, 2026
The Havoc Paradox in Generator-Based Fuzzing

https://doi.org/10.1145/3742894

Li, Ao; Huang, Madonna; Vikram, Vasudev; Lemieux, Caroline; Padhye, Rohan (June 2025, ACM Transactions on Software Engineering and Methodology)

Parametric generators combine coverage-guided and generator-based fuzzing for testing programs requiring structured inputs. They function as decoders that transform arbitrary byte sequences into structured inputs, allowing mutations on byte sequences to map directly to mutations on structured inputs, without requiring specialized mutators. However, this technique is prone to thehavoc effect, where small mutations on the byte sequence cause large, destructive mutations to the structured input. This paper investigates the paradoxical nature of the havoc effect for generator-based fuzzing in Java. In particular, we measure mutation characteristics and confirm the existence of the havoc effect, as well as scenarios where it may be more detrimental. Our evaluation across 7 real-world Java applications compares various techniques that perform context-aware, finer-grained mutations on parametric byte sequences, such as JQF-EI, BeDivFuzz, and Zeugma. We find that these techniques exhibit better control over input mutations and consistently reduce the havoc effect compared to our coverage-guided fuzzer baseline Zest. While we find that context-aware mutation approaches can achieve significantly higher code coverage, we see that destructive mutations still play a valuable role in discovering inputs that increase code coverage. Specialized mutation strategies, while effective, impose substantial computational overhead—revealing practical trade-offs in mitigating the havoc effect.
more » « less
Free, publicly-accessible full text available June 6, 2026
It’s About Time: An Empirical Study of Date and Time Bugs in Open-Source Python Software

https://doi.org/10.1109/MSR66628.2025.00020

Tiwari, Shrey; Chen, Serena; Joukov, Alexander; Vandervelde, Peter; Li, Ao; Padhye, Rohan (April 2025, IEEE)

Accurately performing date and time calculations in software is non-trivial due to the inherent complexity and variability of temporal concepts such as time zones, daylight saving time (DST) adjustments, leap years and leap seconds, clock drifts, and different calendar systems. Although the challenges are frequently discussed in the grey literature, there has not been any systematic study of date/time issues that have manifested in real software systems. To bridge this gap, we qualitatively study 151 bugs and their associated fixes from open-source Python projects on GitHub to understand: (a) the conceptual categories of date/time computations in which bugs occur, (b) the programmatic operations involved in the buggy computations, and (c) the underlying root causes of these errors. We also analyze metrics such as bug severity and detectability as well as fix size and complexity. Our study produces several interesting findings and actionable insights, such as (1) time-zone-related mistakes are the largest contributing factor to date/time bugs; (2) a majority of the studied bugs involved incorrect construction of date/time values; (3) the root causes of date/time bugs often involve misconceptions about library API behavior, such as default conventions or nuances about edge-case behavior; (4) most bugs occur within a single function and can be patched easily, requiring only a few lines of simple code changes. Our findings indicate that static analysis tools can potentially find common classes of high-impact bugs and that such bugs can potentially be fixed automatically. Based on our insights, we also make concrete recommendations to software developers to harden their software against date/time bugs via automated testing strategies.
more » « less
Free, publicly-accessible full text available April 28, 2026
The Havoc Paradox in Generator-Based Fuzzing (Registered Report)

https://doi.org/10.1145/3678722.3685529

Li, Ao; Huang, Madonna; Lemieux, Caroline; Padhye, Rohan (September 2024, ACM)

Full Text Available
Guiding Greybox Fuzzing with Mutation Testing

https://doi.org/10.1145/3597926.3598107

Vikram, Vasudev; Laybourn, Isabella; Li, Ao; Nair, Nicole; OBrien, Kelton; Sanna, Rafaello; Padhye, Rohan (July 2023, ISSTA 2023: Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis)

Greybox fuzzing and mutation testing are two popular but mostly independent fields of software testing research that have so far had limited overlap. Greybox fuzzing, generally geared towards searching for new bugs, predominantly uses code coverage for selecting inputs to save. Mutation testing is primarily used as a stronger alternative to code coverage in assessing the quality of regression tests; the idea is to evaluate tests for their ability to identify artificially injected faults in the target program. But what if we wanted to use greybox fuzzing to synthesize high-quality regression tests? In this paper, we develop and evaluate Mu2, a Java-based framework for incorporating mutation analysis in the greybox fuzzing loop, with the goal of producing a test-input corpus with a high mutation score. Mu2 makes use of a differential oracle for identifying inputs that exercise interesting program behavior without causing crashes. This paper describes several dynamic optimizations implemented in Mu2 to overcome the high cost of performing mutation analysis with every fuzzer-generated input. These optimizations introduce trade-offs in fuzzing throughput and mutation killing ability, which we evaluate empirically on five real-world Java benchmarks. Overall, variants of Mu2 are able to synthesize test-input corpora with a higher mutation score than state-of-the-art Java fuzzer Zest.
more » « less
Full Text Available
On the Naturalness of Fuzzer-Generated Code

https://doi.org/10.1145/3524842.3527972

Kambhamettu, Rajeswari Hita; Billos, John; Oluwaseun-Apo, Tomi; Gafford, Benjamin; Padhye, Rohan; Hellendoorn, Vincent J. (May 2022, 19th International Conference on Mining Software Repositories)

Compiler fuzzing tools such as Csmith have uncovered many bugs in compilers by randomly sampling programs from a generative model. The success of these tools is often attributed to their ability to generate unexpected corner case inputs that developers tend to overlook during manual testing. At the same time, their chaotic nature makes fuzzer-generated test cases notoriously hard to interpret, which has lead to the creation of input simplification tools such as C-Reduce (for C compiler bugs). In until now unrelated work, researchers have also shown that human-written software tends to be rather repetitive and predictable to language models. Studies show that developers deliberately write more predictable code, whereas code with bugs is relatively unpredictable. In this study, we ask the natural questions of whether this high predictability property of code also, and perhaps counter-intuitively, applies to fuzzer-generated code. That is, we investigate whether fuzzer-generated compiler inputs are deemed unpredictable by a language model built on human-written code and surprisingly conclude that it is not. To the contrary, Csmith fuzzer-generated programs are more predictable on a per-token basis than human-written C programs. Furthermore, bug-triggering tended to be more predictable still than random inputs, and the C-Reduce minimization tool did not substantially increase this predictability. Rather, we find that bug-triggering inputs are unpredictable relative to Csmith's own generative model. This is encouraging; our results suggest promising research directions on incorporating predictability metrics in the fuzzing and reduction tools themselves.
more » « less
Full Text Available

Search for: All records