Several metrics have been proposed in the past to quantify the effectiveness of a test suite; they are usually based on some measure of coverage because it is sensible to quantify the effectiveness of a test suite by the extent to which it exercises (covers) various syntactic features of the program under test. Though no coverage metric has emerged as the gold standard of test suite effectiveness, mutation coverage is widely perceived as a reliable measure of test suite effectiveness because the ability of a test suite to detect program mutations can be used as an indication of its ability to detect actual faults. In this paper we aim to challenge the superiority of mutation coverage, by showing that the same test suite may have vastly different values of mutation coverage depending on the mutation operators that are used in the estimation.
more »
« less
Detecting Faults vs. Revealing Failures: Exploring the Missing Link
When we quantify the effectiveness of a test suite by its mutation coverage, we are in fact equating test suite effectiveness with fault detection: to the extent that mutations are faithful proxies of actual faults, it is sensible to consider that the effectiveness of a test suite to kill mutants reflects its ability to detect faults. But there is another way to measure the effectiveness of a test suite: by its ability to expose the failures of an incorrect program. The relationship between failures and faults is tenuous at best: a fault is the adjudged or hypothesized cause of a failure. The same failure may be attributed to more than one fault. This raises the question: what is the relationship between detecting faults and exposing failures. In this paper, we discuss an empirical experiment in which we explore this relationship.
more »
« less
- Award ID(s):
- 2043104
- PAR ID:
- 10538829
- Editor(s):
- Ghosh, Sudipto; Troubitsyna, Elena; Chen, Zhenyu
- Publisher / Repository:
- IEEE
- Date Published:
- ISBN:
- 979-8-3503-6563-4
- Subject(s) / Keyword(s):
- fault detection failure exposure test suite effectiveness mutation testing software testing
- Format(s):
- Medium: X
- Location:
- Cambridge, UK
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Debugging a failure usually requires reproducing it first. This can be hard for failures in production distributed systems, where bugs are exposed only by some unusual faulty events. While fault injection testing becomes popular, existing solutions are designed for bug finding. They are ineffective and inefficient to reproduce a specific failure during debugging. We explore a new type of fault injection technique for quickly reproducing a given fault-induced production failure in distributed systems. We present a tool, Anduril, that uses static causal analysis and a novel feedback-driven algorithm to quickly search the enormous fault space for the root-cause fault and timing. We evaluate Anduril on 22 real-world complex fault-induced failures from five large-scale distributed systems. Anduril reproduced all failures by identifying and injecting the root-cause faults at the right time, in a median of 8 minutes.more » « less
-
Continuous Integration (CI) practices encourage developers to frequently integrate code into a shared repository. Each integration is validated by automatic build and testing such that errors are revealed as early as possible. When CI failures or integration errors are reported, existing techniques are insufficient to automatically locate the root causes for two reasons. First, a CI failure may be triggered by faults in source code and/or build scripts, while current approaches consider only source code. Second, a tentative integration can fail because of build failures and/or test failures, while existing tools focus on test failures only. This paper presents UniLoc, the first unified technique to localize faults in both source code and build scripts given a CI failure log, without assuming the failure’s location (source code or build scripts) and nature (a test failure or not). Adopting the information retrieval (IR) strategy, UniLoc locates buggy files by treating source code and build scripts as documents to search and by considering build logs as search queries. However, instead of naïvely applying an off-the-shelf IR technique to these software artifacts, for more accurate fault localization, UniLoc applies various domain-specific heuristics to optimize the search queries, search space, and ranking formulas. To evaluate UniLoc, we gathered 700 CI failure fixes in 72 open-source projects that are built with Gradle. UniLoc could effectively locate bugs with the average MRR (Mean Reciprocal Rank) value as 0.49, MAP (Mean Average Precision) value as 0.36, and NDCG (Normalized Discounted Cumulative Gain) value as 0.54. UniLoc outperformed the state-of-the-art IR-based tool BLUiR and Locus. UniLoc has the potential to help developers diagnose root causes for CI failures more accurately and efficiently.more » « less
-
Large-scale distributed systems must be built to anticipate and mitigate a variety of hardware and software failures. In order to build confidence that fault-tolerant systems are correctly implemented, Netflix (and similar enterprises) regularly run failure drills in which faults are deliberately injected in their production system. The combinatorial space of failure scenarios is too large to explore exhaustively. Existing failure testing approaches either randomly explore the space of potential failures randomly or exploit the "hunches" of domain experts to guide the search. Random strategies waste resources testing "uninteresting" faults, while programmer-guided approaches are only as good as human intuition and only scale with human effort. In this paper, we describe how we adapted and implemented a research prototype called lineage-driven fault injection (LDFI) to automate failure testing at Netflix. Along the way, we describe the challenges that arose adapting the LDFI model to the complex and dynamic realities of the Netflix architecture. We show how we implemented the adapted algorithm as a service atop the existing tracing and fault injection infrastructure, and present early results.more » « less
-
null (Ed.)The use of multicycle tests, with several functional capture cycles between scan operations, contributes significantly to the ability to compact a test set. Multicycle tests have the added benefit that they can contribute to the detection of defects with complex behaviors that are not detected by single-cycle or two-cycle tests. To ensure that this benefit is materialized when test compaction is applied to transition faults, this article suggests to incorporate into the test compaction procedure an additional fault model whose fault coverage increases when multicycle tests are used. To ensure that the computational complexity of test compaction is not increased by a fault model with a large number of faults, or faults with complex behaviors, the added fault model is required to have the same characteristics as the transition fault model. A type of transition fault called unspecified transition fault satisfies these requirements. The article describes a test compaction procedure for transition faults that incorporates unspecified transition faults, and presents experimental results for benchmark circuits to demonstrate the levels of test compaction and fault coverage that can be achieved.more » « less
An official website of the United States government

