skip to main content

Title: Fault Detection Effectiveness of Metamorphic Relations Developed for Testing Supervised Classifiers
In machine learning, supervised classifiers are used to obtain predictions for unlabeled data by inferring prediction functions using labeled data. Supervised classifiers are widely applied in domains such as computational biology, computational physics and healthcare to make critical decisions. However, it is often hard to test supervised classifiers since the expected answers are unknown. This is commonly known as the oracle problem and metamorphic testing (MT) has been used to test such programs. In MT, metamorphic relations (MRs) are developed from intrinsic characteristics of the software under test (SUT). These MRs are used to generate test data and to verify the correctness of the test results without the presence of a test oracle. Effectiveness of MT heavily depends on the MRs used for testing. In this paper we have conducted an extensive empirical study to evaluate the fault detection effectiveness of MRs that have been used in multiple previous studies to test supervised classifiers. Our study uses a total of 709 reachable mutants generated by multiple mutation engines and uses data sets with varying characteristics to test the SUT. Our results reveal that only 14.8% of these mutants are detected using the MRs and that the fault detection effectiveness of more » these MRs do not scale with the increased number of mutants when compared to what was reported in previous studies. « less
Award ID(s):
Publication Date:
Journal Name:
2019 IEEE International Conference On Artificial Intelligence Testing (AITest)
Page Range or eLocation-ID:
157 to 164
Sponsoring Org:
National Science Foundation
More Like this
  1. Metamorphic testing is a well known approach to tackle the oracle problem in software testing. This technique requires the use of source test cases that serve as seeds for the generation of follow-up test cases. Systematic design of test cases is crucial for the test quality. Thus, source test case generation strategy can make a big impact on the fault detection effectiveness of metamorphic testing. Most of the previous studies on metamorphic testing have used either random test data or existing test cases as source test cases. There has been limited research done on systematic source test case generation for metamorphic testing. This paper provides a comprehensive evaluation on the impact of source test case generation techniques on the fault finding effectiveness of metamorphic testing. We evaluated the effectiveness of line coverage, branch coverage, weak mutation and random test generation strategies for source test case generation. The experiments are conducted with 77 methods from 4 open source code repositories. Our results show that by systematically creating source test cases, we can significantly increase the fault finding effectiveness of metamorphic testing. Further, in this paper we introduce a simple metamorphic testing tool called "METtester" that we use to conduct metamorphic testingmore »on these methods.« less
  2. Mutation testing is a powerful technique for evaluating the quality of test suite which plays a key role in ensuring software quality. The concept of mutation testing has also been widely used in other software engineering studies, e.g., test generation, fault localization, and program repair. During the process of mutation testing, large number of mutants may be generated and then executed against the test suite to examine whether they can be killed, making the process extremely computational expensive. Several techniques have been proposed to speed up this process, including selective, weakened, and predictive mutation testing. Among those techniques, Predictive Mutation Testing (PMT) tries to build a classification model based on an amount of mutant execution records to predict whether coming new mutants would be killed or alive without mutant execution, and can achieve significant mutation cost reduction. In PMT, each mutant is represented as a list of features related to the mutant itself and the test suite, transforming the mutation testing problem to a binary classification problem. In this paper, we perform an extensive study on the effectiveness and efficiency of the promising PMT technique under the cross-project setting using a total 654 real world projects with more than 4more »Million mutants. Our work also complements the original PMT work by considering more features and the powerful deep learning models. The experimental results show an average of over 0.85 prediction accuracy on 654 projects using cross validation, demonstrating the effectiveness of PMT. Meanwhile, a clear speed up is also observed with an average of 28.7× compared to traditional mutation testing with 5 threads. In addition, we analyze the importance of different groups of features in classification model, which provides important implications for the future research.« less
  3. Software testing is difficult to automate, especially in programs which have no oracle, or method of determining which output is correct. Metamorphic testing is a solution this problem. Metamorphic testing uses metamorphic relations to define test cases and expected outputs. A large amount of time is needed for a domain expert to determine which metamorphic relations can be used to test a given program. Metamorphic relation prediction removes this need for such an expert. We propose a method using semi-supervised machine learning to detect which metamorphic relations are applicable to a given code base. We compare this semi-supervised model with a supervised model, and show that the addition of unlabeled data improves the classification accuracy of the MR prediction model.
  4. Metamorphic testing is an advanced technique to test programs without a true test oracle such as machine learning applications. Because these programs have no general oracle to identify their correctness, traditional testing techniques such as unit testing may not be helpful for developers to detect potential bugs. This paper presents a novel system, KABU, which can dynamically infer properties of methods' states in programs that describe the characteristics of a method before and after transforming its input. These Metamorphic Properties (MPs) are pivotal to detecting potential bugs in programs without test oracles, but most previous work relies solely on human effort to identify them and only considers MPs between input parameters and output result (return value) of a program or method. This paper also proposes a testing concept, Metamorphic Differential Testing (MDT). By detecting different sets of MPs between different versions for the same method, KABU reports potential bugs for human review. We have performed a preliminary evaluation of KABU by comparing the MPs detected by humans with the MPs detected by KABU. Our preliminary results are promising: KABU can find more MPs than human developers, and MDT is effective at detecting function changes in methods.
  5. Water distribution systems (WDSs) face a significant challenge in the form of pipe leaks. Pipe leaks can cause loss of a large amount of treated water, leading to pressure loss, increased energy costs, and contamination risks. Locating pipe leaks has been a constant challenge for water utilities and stakeholders due to the underground location of the pipes. Physical methods to detect leaks are expensive, intrusive, and heavily localized. Computational approaches provide an economical alternative to physical methods. Data-driven machine learning-based computational approaches have garnered growing interest in recent years to address the challenge of detecting pipe leaks in WDSs. While several studies have applied machine learning models for leak detection on single pipes and small test networks, their applicability to the real-world WDSs is unclear. Most of these studies simplify the leak characteristics and ignore modeling and measuring device uncertainties, which makes the scalability of their approaches questionable to real-world WDSs. Our study addresses this issue by devising four study cases that account for the realistic leak characteristics (multiple, multi-size, and randomly located leaks) and incorporating noise in the input data to account for the model- and measuring device- related uncertainties. A machine learning-based approach that uses simulated pressure asmore »input to predict both location and size of leaks is proposed. Two different machine learning models: Multilayer Perceptron (MLP) and Convolutional Neural Network (CNN), are trained and tested for the four study cases, and their performances are compared. The precision and recall results for the L-Town network indicate good accuracies for both the models for all study cases, with CNN generally outperforming MLP.« less