skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on January 1, 2026

Title: Subsumption, correctness and relative correctness: Implications for software testing
Context. Several Research areas emerged and have been proceeding independently when in fact they have much in common. These include: mutant subsumption and mutant set minimization; relative correctness and the semantic definition of faults; differentiator sets and their application to test diversity; generate-and--validate methods of program repair; test suite coverage metrics. Objective. Highlight their analogies, commonalities and overlaps; explore their potential for synergy and shared research goals; unify several disparate concepts around a minimal set of artifacts. Method. Introduce and analyze a minimal set of concepts that enable us to model these disparate research efforts, and explore how these models may enable us to share insights between different research directions, and advance their respective goals. Results. Capturing absolute (total and partial) correctness and relative (total and partial) correctness with a single concept: Detector sets. Using the same concept to quantify the effectiveness of test suites, and prove that the proposed measure satisfies appealing monotonicity properties. Using the measure of test suite effectiveness to model mutant set minimization as an optimization problem, characterized by an objective function and a constraint. Generalizing the concept of mutant subsumption using the concept of differentiator sets. Identifying analogies between detector sets and differentiator sets, and inferring relationships between subsumption and relative correctness. Conclusion. This paper does not aim to answer any pressing research question as much as it aims to raise research questions that use the insights gained from one research venue to gain a fresh perspective on a related research issue. mutant subsumption; mutant set minimization; relative correctness; absolute correctness; total correctness; partial correctness; program fault; program repair; differentiator set; detector set.  more » « less
Award ID(s):
2043104
PAR ID:
10538648
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Elsevier
Date Published:
Journal Name:
Science of Computer Programming
Volume:
239
Issue:
C
ISSN:
0167-6423
Page Range / eLocation ID:
103177
Subject(s) / Keyword(s):
absolute correctness relative correctness mutant subsumption mutation testing semantic coverage test suite effectiveness mutant set effectivenes.
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. To repair a program does not mean to make it (absolutely) correct; it only means to make it more-correct than it was originally. This is not a mundane academic distinction: Given that programs typically have about a dozen faults per KLOC, it is important for program repair methods and tools to be designed in such a way that they map an incorrect program into a more-correct, albeit still potentially incorrect, program. Yet in the absence of a concept of relative correctness, many program repair methods and tools resort to approximations of absolute correctness; since these methods and tools are often validated against programs with a single fault, making them absolutely correct is indistinguishable from making them more-correct; this has contributed to conceal/ obscure the absence of (and the need for) relative correctness. In this paper we propose a theory of program repair based on a concept of relative correctness. We aspire to encourage researchers in program repair to explicitly specify what concept of relative correctness their method or tool is based upon; and to validate their method or tool by proving that it does enhance relative correctness, as defined. 
    more » « less
  2. Mutation testing is a powerful technique for evaluating the quality of test suite which plays a key role in ensuring software quality. The concept of mutation testing has also been widely used in other software engineering studies, e.g., test generation, fault localization, and program repair. During the process of mutation testing, large number of mutants may be generated and then executed against the test suite to examine whether they can be killed, making the process extremely computational expensive. Several techniques have been proposed to speed up this process, including selective, weakened, and predictive mutation testing. Among those techniques, Predictive Mutation Testing (PMT) tries to build a classification model based on an amount of mutant execution records to predict whether coming new mutants would be killed or alive without mutant execution, and can achieve significant mutation cost reduction. In PMT, each mutant is represented as a list of features related to the mutant itself and the test suite, transforming the mutation testing problem to a binary classification problem. In this paper, we perform an extensive study on the effectiveness and efficiency of the promising PMT technique under the cross-project setting using a total 654 real world projects with more than 4 Million mutants. Our work also complements the original PMT work by considering more features and the powerful deep learning models. The experimental results show an average of over 0.85 prediction accuracy on 654 projects using cross validation, demonstrating the effectiveness of PMT. Meanwhile, a clear speed up is also observed with an average of 28.7× compared to traditional mutation testing with 5 threads. In addition, we analyze the importance of different groups of features in classification model, which provides important implications for the future research. 
    more » « less
  3. null (Ed.)
    In this paper we argue for using many partial test suites instead of one full test suite during program repair. This may provide a pool of simpler, yet correct patches, addressing both the overfitting and poor repair quality problem. To support this idea, we present some insight obtained running APR partial test suites on the well studied triangle program. 
    more » « less
  4. To reduce the cost of mutation testing, researchers have sought to find minimal mutant sets. As an optimization problem, mutant set minimization is defined by two parameters: the objective function that we must optimize; and the constraint under which the optimization is carried out. Whereas the objective function of this optimization problem is clear (minimizing the cardinality of the mutant set), the constraint under which this optimization is attempted has not been clearly articulated in the literature. In this paper, we propose a formal definition of this constraint and discuss in what sense, and to what extent, published algorithms of mutant set minimization comply with this constraint. 
    more » « less
  5. This paper studies the problem of arguing program correctness for logic programs with aggregates in the context of Answer Set Programming. Cabalar, Fandinno, and Lierler (2020) championed a modular methodology for arguing program correctness. We show how a recently proposed many-sorted semantics for logic programs with aggregates allows us to apply their methodology to this type of program. This is illustrated using well-known encodings for the Graph Coloring and Traveling Salesman problems. In particular, we showcase how this modular approach allows us to reuse the proof of correctness of a Hamiltonian Cycle encoding studied in a previous publication when considering the Traveling Salesman program. 
    more » « less