Go is a young programming language invented to build safe and efficient concurrent programs. It provides goroutines as lightweight threads and channels for inter-goroutine communication. Programmers are encouraged to explicitly pass messages through channels to connect goroutines, with the purpose of reducing the chance of making programming mistakes and introducing concurrency bugs. Go is one of the most beloved programming languages and has already been used to build many critical infrastructure software systems in the data-center environment. However, a recent study shows that channel-related concurrency bugs are still common in Go programs, severely hurting the reliability of the programs. This paper presents GFuzz, a dynamic detector that can effectively pinpoint channel-related concurrency bugs by mutating the processing orders of concurrent messages. We build GFuzz in three steps. We first adopt an effective approach to identify concurrent messages and transform a program to process those messages in any given order. We then take a fuzzing approach to generate new processing orders by mutating exercised ones and rely on execution feedback to prioritize orders close to triggering bugs. Finally, we design a runtime sanitizer to capture triggered bugs that are missed by the Go runtime. We evaluate GFuzz on seven popular Go software systems, including Docker, Kubernetes, and gRPC. GFuzz finds 184 previously unknown bugs and reports a negligible number of false positives. Programmers have already confirmed 124 reports as real bugs and fixed 67 of them based on our reporting. A careful inspection of the detected concurrency bugs from gRPC shows the effectiveness of each component of GFuzz and confirms the components' rationality.
more »
« less
How many mutex bugs can a simple analysis find in Go programs?
In open source software, it is known that there are many concurrency bugs. A previous study in Go revealed that a considerable number of such bugs are simple (for example, 9\% of the bugs are the ones that forget to unlock a mutex,) through a manual program investigation. This paper tries to detect such bugs by applying a simple analysis in order to see how far such a tool can match the manual analysis. We built a simple intraprocedural control flow analysis in Go, and evaluated its performance with respect to the open source programs with concurrency bugs reported in the previous study. Consequently, as for quality, the recall is good at 88\% and the precision is poor at 60\%, and as for analysis time, it can be finished within practical amount of time (for example, 1 second per 5000 LoC).
more »
« less
- Award ID(s):
- 2200343
- PAR ID:
- 10417315
- Date Published:
- Journal Name:
- Annual Conference of the Japanese Society for Software Science and Technology
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Accurately performing date and time calculations in software is non-trivial due to the inherent complexity and variability of temporal concepts such as time zones, daylight saving time (DST) adjustments, leap years and leap seconds, clock drifts, and different calendar systems. Although the challenges are frequently discussed in the grey literature, there has not been any systematic study of date/time issues that have manifested in real software systems. To bridge this gap, we qualitatively study 151 bugs and their associated fixes from open-source Python projects on GitHub to understand: (a) the conceptual categories of date/time computations in which bugs occur, (b) the programmatic operations involved in the buggy computations, and (c) the underlying root causes of these errors. We also analyze metrics such as bug severity and detectability as well as fix size and complexity. Our study produces several interesting findings and actionable insights, such as (1) time-zone-related mistakes are the largest contributing factor to date/time bugs; (2) a majority of the studied bugs involved incorrect construction of date/time values; (3) the root causes of date/time bugs often involve misconceptions about library API behavior, such as default conventions or nuances about edge-case behavior; (4) most bugs occur within a single function and can be patched easily, requiring only a few lines of simple code changes. Our findings indicate that static analysis tools can potentially find common classes of high-impact bugs and that such bugs can potentially be fixed automatically. Based on our insights, we also make concrete recommendations to software developers to harden their software against date/time bugs via automated testing strategies.more » « less
-
Concurrency bugs are hard to discover and reproduce, even in well-synchronized programs that are free of data races. Thankfully, prior work on controlled concurrency testing (CCT) has developed sophisticated algorithms---such as partial-order based and selectively uniform sampling---to effectively search over the space of thread interleavings. Unfortunately, in practice, these techniques cannot easily be applied to real-world Java programs due to the difficulties of controlling concurrency in the presence of the managed runtime and complex synchronization primitives. So, mature Java projects that make heavy use of concurrency still rely on naive repeated stress testing in a loop. In this paper, we take a first-principles approach for elucidating the requirements and design space to enable CCT on arbitrary real-world JVM applications. We identify practical challenges with classical design choices described in prior work---such as concurrency mocking, VM hacking, and OS-level scheduling---that affect bug-finding effectiveness and/or the scope of target applications that can be easily supported. Based on these insights, we present Fray, a new platform for performing push-button concurrency testing (beyond data races) of JVM programs. The key design principle behind Fray is to orchestrate thread interleavings without replacing existing concurrency primitives, using a concurrency control mechanism called shadow locking for faithfully expressing the set of all possible program behaviors. With full concurrency control, Fray can test applications using a number of search algorithms from a simple random walk to sophisticated techniques like PCT, POS, and SURW. In an empirical evaluation on 53 benchmark programs with known bugs (SCTBench and JaConTeBe), Fray with random walk finds 70% more bugs than JPF and 77% more bugs than RR's chaos mode. We also demonstrate Fray's push-button applicability on 2,664 tests from Apache Kafka, Lucene, and Google Guava. In these mature projects, Fray successfully discovered 18 real-world concurrency bugs that can cause 371 of the existing tests to fail under specific interleavings. We believe that Fray serves as a bridge between classical academic research and industrial practice--- empowering developers with advanced concurrency testing algorithms that demonstrably uncover more bugs, while simultaneously providing researchers a platform for large-scale evaluation of search techniques.more » « less
-
Concurrency bugs are extremely difficult to detect. Recently, several dynamic techniques achieve sound analysis. M2 is even complete for two threads. It is designed to decide whether two events can occur consecutively. However, real-world concurrency bugs can involve more events and threads. Some can occur when the order of two or more events can be exchanged even if they occur not consecutively. We propose a new technique SeqCheck to soundly decide whether a sequence of events can occur in a specified order. The ordered sequence represents a potential concurrency bug. And several known forms of concurrency bugs can be easily encoded into event sequences where each represents a way that the bug can occur. To achieve it, SeqCheck explicitly analyzes branch events and includes a set of efficient algorithms. We show that SeqCheck is sound; and it is also complete on traces of two threads. We have implemented SeqCheck to detect three types of concurrency bugs and evaluated it on 51 Java benchmarks producing up to billions of events. Compared with M2 and other three recent sound race detectors, SeqCheck detected 333 races in ~30 minutes; while others detected from 130 to 285 races in ~6 to ~12 hours. SeqCheck detected 20 deadlocks in ~6 seconds. This is only one less than Dirk; but Dirk spent more than one hour. SeqCheck also detected 30 atomicity violations in ~20 minutes. The evaluation shows SeqCheck can significantly outperform existing concurrency bug detectors.more » « less
-
Actor concurrency is becoming increasingly important in the real world and mission-critical software. This requires these applications to be free from actor bugs, that occur in the real world, and have tests that are effective in finding these bugs. Mutation testing is a well-established technique that transforms an application to induce its likely bugs and evaluate the effectiveness of its tests in finding these bugs. Mutation testing is available for a broad spectrum of applications and their bugs, ranging from web to mobile to machine learning, and is used at scale in companies like Google and Facebook. However, there still is no mutation testing for actor concurrency that uses real-world actor bugs. In this paper, we propose 𝜇Akka, a framework for mutation testing of Akka actor concurrency using real actor bugs. Akka is a popular industrial-strength implementation of actor concurrency. To design, implement, and evaluate 𝜇Akka, we take the following major steps: (1) manually analyze a recent set of 186 real Akka bugs from Stack Overflow and GitHub to understand their causes; (2) design a set of 32 mutation operators, with 138 source code changes in Akka API, to emulate these causes and induce their bugs; (3) implement these operators in an Eclipse plugin for Java Akka; (4) use the plugin to generate 11.7k mutants of 10 real GitHub applications, with 446.4k lines of code and 7.9k tests; (5) run these tests on these mutants to measure the quality of mutants and effectiveness of tests; (6) use PIT to generate 26.2k mutants to compare 𝜇Akka and PIT mutant quality and test effectiveness. PIT is a popular mutation testing tool with traditional operators; (7) manually analyze the bug coverage and overlap of 𝜇Akka, PIT, and actor operators in a previous work; and (8) discuss a few implications of our findings. Among others, we find that 𝜇Akka mutants are higher quality, cover more bugs, and tests are less effective in detecting them.more » « less
An official website of the United States government

