skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on October 9, 2026

Title: Fray: An Efficient General-Purpose Concurrency Testing Platform for the JVM
Concurrency bugs are hard to discover and reproduce, even in well-synchronized programs that are free of data races. Thankfully, prior work on controlled concurrency testing (CCT) has developed sophisticated algorithms---such as partial-order based and selectively uniform sampling---to effectively search over the space of thread interleavings. Unfortunately, in practice, these techniques cannot easily be applied to real-world Java programs due to the difficulties of controlling concurrency in the presence of the managed runtime and complex synchronization primitives. So, mature Java projects that make heavy use of concurrency still rely on naive repeated stress testing in a loop. In this paper, we take a first-principles approach for elucidating the requirements and design space to enable CCT on arbitrary real-world JVM applications. We identify practical challenges with classical design choices described in prior work---such as concurrency mocking, VM hacking, and OS-level scheduling---that affect bug-finding effectiveness and/or the scope of target applications that can be easily supported. Based on these insights, we present Fray, a new platform for performing push-button concurrency testing (beyond data races) of JVM programs. The key design principle behind Fray is to orchestrate thread interleavings without replacing existing concurrency primitives, using a concurrency control mechanism called shadow locking for faithfully expressing the set of all possible program behaviors. With full concurrency control, Fray can test applications using a number of search algorithms from a simple random walk to sophisticated techniques like PCT, POS, and SURW. In an empirical evaluation on 53 benchmark programs with known bugs (SCTBench and JaConTeBe), Fray with random walk finds 70% more bugs than JPF and 77% more bugs than RR's chaos mode. We also demonstrate Fray's push-button applicability on 2,664 tests from Apache Kafka, Lucene, and Google Guava. In these mature projects, Fray successfully discovered 18 real-world concurrency bugs that can cause 371 of the existing tests to fail under specific interleavings. We believe that Fray serves as a bridge between classical academic research and industrial practice--- empowering developers with advanced concurrency testing algorithms that demonstrably uncover more bugs, while simultaneously providing researchers a platform for large-scale evaluation of search techniques.  more » « less
Award ID(s):
2453432 2120955 2429384
PAR ID:
10651550
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
Proceedings of the ACM on Programming Languages
Volume:
9
Issue:
OOPSLA2
ISSN:
2475-1421
Page Range / eLocation ID:
4035 to 4063
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Concurrency bugs are extremely difficult to detect. Recently, several dynamic techniques achieve sound analysis. M2 is even complete for two threads. It is designed to decide whether two events can occur consecutively. However, real-world concurrency bugs can involve more events and threads. Some can occur when the order of two or more events can be exchanged even if they occur not consecutively. We propose a new technique SeqCheck to soundly decide whether a sequence of events can occur in a specified order. The ordered sequence represents a potential concurrency bug. And several known forms of concurrency bugs can be easily encoded into event sequences where each represents a way that the bug can occur. To achieve it, SeqCheck explicitly analyzes branch events and includes a set of efficient algorithms. We show that SeqCheck is sound; and it is also complete on traces of two threads. We have implemented SeqCheck to detect three types of concurrency bugs and evaluated it on 51 Java benchmarks producing up to billions of events. Compared with M2 and other three recent sound race detectors, SeqCheck detected 333 races in ~30 minutes; while others detected from 130 to 285 races in ~6 to ~12 hours. SeqCheck detected 20 deadlocks in ~6 seconds. This is only one less than Dirk; but Dirk spent more than one hour. SeqCheck also detected 30 atomicity violations in ~20 minutes. The evaluation shows SeqCheck can significantly outperform existing concurrency bug detectors. 
    more » « less
  2. null (Ed.)
    Tests that modify (i.e., "pollute") the state shared among tests in a test suite are called \polluter tests". Finding these tests is im- portant because they could result in di erent test outcomes based on the order of the tests in the test suite. Prior work has proposed the PolDet technique for nding polluter tests in runs of JUnit tests on a regular Java Virtual Machine (JVM). Given that Java PathFinder (JPF) provides desirable infrastructure support, such as systematically exploring thread schedules, it is a worthwhile attempt to re-implement techniques such as PolDet in JPF. We present a new implementation of PolDet for nding polluter tests in runs of JUnit tests in JPF. We customize the existing state comparison in JPF to support the so-called \common-root iso- morphism" required by PolDet. We find that our implementation is simple, requiring only -200 lines of code, demonstrating that JPF is a sophisticated infrastructure for rapid exploration of re-search ideas on software testing. We evaluate our implementation on 187 test classes from 13 Java projects and nd 26 polluter tests. Our results show that the runtime overhead of PolDet@JPF com- pared to base JPF is relatively low, on average 1.43x. However, our experiments also show some potential challenges with JPF. 
    more » « less
  3. Analyzing multithreaded programs is notoriously hard due to the exponential number of thread interleavings. Although race detectors can help developers find and fix such bugs before the code is deployed, multithreaded code may still be buggy due to memory errors and assertion violations that are not due to race conditions. This paper presents a property directed symbolic execution of multithreaded code. Our approach, named SIFT, differs from previous work on detecting errors in multithreaded code by being property directed and by handling both memory safety and assertion checking that can be further customized by the user. SIFT can detect bugs that may or may not be due to data races, and works in an iterative way. In each step, it explores the state space using selective scheduling based on a set of interleaving points that have been inferred in the previous step. We have developed three partitioning strategies for improved effectiveness and performance. We have implemented SIFT on top of the KLEE symbolic execution engine and applied it to various real-world and academic benchmarks. SIFT could detect more vulnerabilities than a state-of-the-art memory vulnerability detector. 
    more » « less
  4. Automated Program Repair (APR) is one of the most recent advances in automated debugging, and can directly fix buggy programs with minimal human intervention. Although various advanced APR techniques (including search-based or semantic-based ones) have been proposed, they mainly work at the source-code level and it is not clear how bytecode-level APR performs in practice. Also, empirical studies of the existing techniques on bugs beyond what has been reported in the original papers are rather limited. In this paper, we implement the first practical bytecode-level APR technique, PraPR, and present the first extensive study on fixing real-world bugs (e.g., Defects4J bugs) using JVM bytecode mutation. The experimental results show that surprisingly even PraPR with only the basic traditional mutators can produce genuine fixes for 17 bugs; with simple additional commonly used APR mutators, PraPR is able to produce genuine fixes for 43 bugs, significantly outperforming state-of-the-art APR, while being over 10X faster. Furthermore, we performed an extensive study of PraPR and other recent APR tools on a large number of additional real-world bugs, and demonstrated the overfitting problem of recent advanced APR tools for the first time. Lastly, PraPR has also successfully fixed bugs for other JVM languages (e.g., for the popular Kotlin language), indicating PraPR can greatly complement existing source-code-level APR. 
    more » « less
  5. Actor concurrency is becoming increasingly important in the real world and mission-critical software. This requires these applications to be free from actor bugs, that occur in the real world, and have tests that are effective in finding these bugs. Mutation testing is a well-established technique that transforms an application to induce its likely bugs and evaluate the effectiveness of its tests in finding these bugs. Mutation testing is available for a broad spectrum of applications and their bugs, ranging from web to mobile to machine learning, and is used at scale in companies like Google and Facebook. However, there still is no mutation testing for actor concurrency that uses real-world actor bugs. In this paper, we propose 𝜇Akka, a framework for mutation testing of Akka actor concurrency using real actor bugs. Akka is a popular industrial-strength implementation of actor concurrency. To design, implement, and evaluate 𝜇Akka, we take the following major steps: (1) manually analyze a recent set of 186 real Akka bugs from Stack Overflow and GitHub to understand their causes; (2) design a set of 32 mutation operators, with 138 source code changes in Akka API, to emulate these causes and induce their bugs; (3) implement these operators in an Eclipse plugin for Java Akka; (4) use the plugin to generate 11.7k mutants of 10 real GitHub applications, with 446.4k lines of code and 7.9k tests; (5) run these tests on these mutants to measure the quality of mutants and effectiveness of tests; (6) use PIT to generate 26.2k mutants to compare 𝜇Akka and PIT mutant quality and test effectiveness. PIT is a popular mutation testing tool with traditional operators; (7) manually analyze the bug coverage and overlap of 𝜇Akka, PIT, and actor operators in a previous work; and (8) discuss a few implications of our findings. Among others, we find that 𝜇Akka mutants are higher quality, cover more bugs, and tests are less effective in detecting them. 
    more » « less