skip to main content

Title: Fault detection effectiveness of source test case generation strategies for metamorphic testing
Metamorphic testing is a well known approach to tackle the oracle problem in software testing. This technique requires the use of source test cases that serve as seeds for the generation of follow-up test cases. Systematic design of test cases is crucial for the test quality. Thus, source test case generation strategy can make a big impact on the fault detection effectiveness of metamorphic testing. Most of the previous studies on metamorphic testing have used either random test data or existing test cases as source test cases. There has been limited research done on systematic source test case generation for metamorphic testing. This paper provides a comprehensive evaluation on the impact of source test case generation techniques on the fault finding effectiveness of metamorphic testing. We evaluated the effectiveness of line coverage, branch coverage, weak mutation and random test generation strategies for source test case generation. The experiments are conducted with 77 methods from 4 open source code repositories. Our results show that by systematically creating source test cases, we can significantly increase the fault finding effectiveness of metamorphic testing. Further, in this paper we introduce a simple metamorphic testing tool called "METtester" that we use to conduct metamorphic testing more » on these methods. « less
Authors:
;
Award ID(s):
1656877
Publication Date:
NSF-PAR ID:
10062904
Journal Name:
3rd International Workshop on Metamorphic Testing
Page Range or eLocation-ID:
2 to 9
Sponsoring Org:
National Science Foundation
More Like this
  1. Simulation is widely used for validation of Register-Transfer-Level (RTL) models. While simulating with millions of random or constrained-random tests can cover majority of the functional scenarios, the number of remaining scenarios can still be huge (hundreds or thousands) in case of today's industrial designs. Hard-to-activate branches are one of the major contributors for such remaining/untested scenarios. While directed test generation techniques using formal methods are promising in activating branches, it is infeasible to apply them on large designs due to state space explosion. In this paper, we propose a fully automated and scalable approach to cover the hard-to-activate branches usingmore »concolic testing of RTL models. While application of concolic testing on hardware designs has shown some promising results in improving the overall coverage, they are not designed to activate specific targets such as uncovered corner cases and rare scenarios. This paper makes two important contributions. (1) We propose a directed test generation technique to activate a target by effective utilization of concolic testing on RTL models. (2) We develop efficient learning and clustering techniques to minimize the overlapping searches across targets to drastically reduce the overall test generation effort.« less
  2. Coverage-guided fuzzing's aggressive, high-volume testing has helped reveal tens of thousands of software security flaws. While executing billions of test cases mandates fast code coverage tracing, the nature of binary-only targets leads to reduced tracing performance. A recent advancement in binary fuzzing performance is Coverage-guided Tracing (CGT), which brings orders-of-magnitude gains in throughput by restricting the expense of coverage tracing to only when new coverage is guaranteed. Unfortunately, CGT suits only a basic block coverage granularity---yet most fuzzers require finer-grain coverage metrics: edge coverage and hit counts. It is this limitation which prohibits nearly all of today's state-of-the-art fuzzers frommore »attaining the performance benefits of CGT.This paper tackles the challenges of adapting CGT to fuzzing's most ubiquitous coverage metrics. We introduce and implement a suite of enhancements that expand CGT's introspection to fuzzing's most common code coverage metrics, while maintaining its orders-of-magnitude speedup over conventional always-on coverage tracing. We evaluate their trade-offs with respect to fuzzing performance and effectiveness across 12 diverse real-world binaries (8 open- and 4 closed-source). On average, our coverage-preserving CGT attains near-identical speed to the present block-coverage-only CGT, UnTracer; and outperforms leading binary- and source-level coverage tracers QEMU, Dyninst, RetroWrite, and AFL-Clang by 2--24x, finding more bugs in less time.« less
  3. The feedback provided by current testing education tools about the deficiencies in a student’s test suite either mimics industry code coverage tools or lists specific instructor test cases that are missing from the student’s test suite. While useful in some sense, these types of feedback are akin to revealing the solution to the problem, which can inadvertently encourage students to pursue a trial-and-error approach to testing, rather than using a more systematic approach that encourages learning. In addition to not teaching students why their test suite is inadequate, this type of feedback may motivate students to become dependent on themore »feedback rather than thinking for themselves. To address this deficiency, there is an opportunity to investigate alternative feedback mechanisms that include a positive reinforcement of testing concepts. We argue that using an inquiry-based learning approach is better than simply providing the answers. To facilitate this type of learning, we present Testing Tutor, a web-based assignment submission platform that supports different levels of testing pedagogy via a customizable feedback engine. We evaluated the impact of the different types of feedback through an empirical study in two sophomore-level courses.We use Testing Tutor to provide students with different types of feedback, either traditional detailed code coverage feedback or inquiry-based learning conceptual feedback, and compare the effects. The results show that students that receive conceptual feedback had higher code coverage (by different measures), fewer redundant test cases, and higher programming grades than the students who receive traditional code coverage feedback.« less
  4. In machine learning, supervised classifiers are used to obtain predictions for unlabeled data by inferring prediction functions using labeled data. Supervised classifiers are widely applied in domains such as computational biology, computational physics and healthcare to make critical decisions. However, it is often hard to test supervised classifiers since the expected answers are unknown. This is commonly known as the oracle problem and metamorphic testing (MT) has been used to test such programs. In MT, metamorphic relations (MRs) are developed from intrinsic characteristics of the software under test (SUT). These MRs are used to generate test data and to verifymore »the correctness of the test results without the presence of a test oracle. Effectiveness of MT heavily depends on the MRs used for testing. In this paper we have conducted an extensive empirical study to evaluate the fault detection effectiveness of MRs that have been used in multiple previous studies to test supervised classifiers. Our study uses a total of 709 reachable mutants generated by multiple mutation engines and uses data sets with varying characteristics to test the SUT. Our results reveal that only 14.8% of these mutants are detected using the MRs and that the fault detection effectiveness of these MRs do not scale with the increased number of mutants when compared to what was reported in previous studies.« less
  5. Traditional low cost scan based structural tests no longer suffice for delivering acceptable defect levels in many processor SOCs, especially those targeting low power applications. Expensive functional system level tests (SLTs) have become an additional and necessary final test screen. Efforts to eliminate or minimize the use of SLTs have focused on new fault models and improved test generation methods to improve the effectiveness of scan tests. In this paper we argue that given the limitations of scan timing tests, such an approach may not be sufficient to detect all the low voltage failures caused by circuit timing variability thatmore »appear to dominate SLT fallout. Instead, we propose an alternate approach for meaningful cost savings that adaptively avoids SLT tests for a subset of the manufactured parts. This is achieved by using parametric and scan tests results from earlier in the test flow to identify low delay variability parts that can avoid SLT with minimal impact on DPPM. Extensive SPICE simulations support the viability of our proposed approach. We also show that such an adaptive test flow is also very well suited to real time optimization during the using machine-learning techniques.« less