skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Ensuring the Observability of Structural Test Obligations
Test adequacy criteria are widely used to guide test creation. However, many of these criteria are sensitive to statement structure or the choice of test oracle. This is because such criteria ensure that execution reaches the element of interest, but impose no constraints on the execution path after this point. We are not guaranteed to observe a failure just because a fault is triggered. To address this issue, we have proposed the concept of observability—an extension to coverage criteria based on Boolean expressions that combines the obligations of a host criterion with an additional path condition that increases the likelihood that a fault encountered will propagate to a monitored variable. Our study, conducted over five industrial systems and an additional forty open-source systems, has revealed that adding observability tends to improve efficacy over satisfaction of the traditional criteria, with average improvements of 125.98% in mutation detection with the common output-only test oracle and per-model improvements of up to 1760.52%. Ultimately, there is merit to our hypothesis—observability reduces sensitivity to the choice of oracle and to the program structure.  more » « less
Award ID(s):
1657299
PAR ID:
10082000
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE Transactions on Software Engineering
ISSN:
0098-5589
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present the design and implementation of GVM, the first system for executing Java bytecode entirely on GPUs. GVM is ideal for applications that execute a large number of short-living tasks, which share a significant fraction of their codebase and have similar execution time. GVM uses novel algorithms, scheduling, and data layout techniques to adapt to the massively parallel programming and execution model of GPUs. We apply GVM to generate and execute tests for Java projects. First, we implement a sequence-based test generation on top of GVM and design novel algorithms to avoid redundant test sequences. Second, we use GVM to execute randomly generated test cases. We evaluate GVM by comparing it with two existing Java bytecode interpreters (Oracle JVM and Java Pathfinder), as well as with the Oracle JVM with just-in-time (JIT) compiler, which has been engineered and optimized for over twenty years. Our evaluation shows that sequence-based test generation on GVM outperforms both Java Pathfinder and Oracle JVM interpreter. Additionally, our results show that GVM performs as well as running our parallel sequence-based test generation algorithm using JVM with JIT with many CPU threads. Furthermore, our evaluation on several classes from open-source projects shows that executing randomly generated tests on GVM outperforms sequential execution on JVM interpreter and JVM with JIT. 
    more » « less
  2. The paradigm shift of deploying applications to the cloud has introduced both opportunities and challenges. Although clouds use elasticity to scale resource usage at runtime to help meet an application’s performance requirements, developers are still challenged by unpredictable performance, little control of execution environment, and differences among cloud service providers, all while being charged for their cloud usages. Application performance stability is particularly affected by multi-tenancy in which the hardware is shared among varying applications and virtual machines. Developers porting their applications need to meet performance requirements, but testing on the cloud under the effects of performance uncertainty is difficult and expensive, due to high cloud usage costs. This paper presents a first approach to testing an application with typical inputs for how its performance will be affected by performance uncertainty, without incurring undue costs of bruteforce testing in the cloud. We specify cloud uncertainty testing criteria, design a test-based strategy to characterize the blackbox cloud’s performance distributions using these testing criteria, and support execution of tests to characterize the resource usage and cloud baseline performance of the application to be deployed. Importantly, we developed a smart test oracle that estimates the application’s performance with certain confidence levels using the above characterization test results and determines whether it will meet its performance requirements. We evaluated our testing approach on both the Chameleon cloud and Amazon web services; results indicate that this testing strategy shows promise as a cost-effective approach to test for performance effects of cloud uncertainty when porting an application to the cloud. 
    more » « less
  3. Dozens of criteria have been proposed to judge testing adequacy. Such criteria are important, as they guide automated generation efforts. Yet, the current use of such criteria in automated generation contrasts how such criteria are used by humans. For a human, coverage is part of a multifaceted combination of testing strategies. In automated generation, coverage is typically the goal, and a single fitness function is applied at one time. We propose that the key to improving the fault detection efficacy of search-based test generation approaches lies in a targeted, multifaceted approach pairing primary fitness functions that effectively explore the structure of the class under test with lightweight supporting fitness functions that target particular scenarios likely to trigger an observable failure. This report summarizes our findings to date, details the hypothesis of primary and supporting fitness functions, and identifies outstanding research challenges related to multifaceted test suite generation. We hope to inspire new advances in search-based test generation that could benefit our software-powered society. 
    more » « less
  4. Real-time systems are susceptible to adversarial factors such as faults and attacks, leading to severe consequences. This paper presents an optimal checkpoint scheme to bolster fault resilience in real-time systems, addressing both logical consistency and timing correctness. First, we partition message-passing processes into a directed acyclic graph (DAG) based on their dependencies, ensuring checkpoint logical consistency. Then, we identify the DAG’s critical path, representing the longest sequential path, and analyze the optimal checkpoint strategy along this path to minimize overall execution time, including checkpointing overhead. Upon fault detection, the system rolls back to the nearest valid checkpoints for recovery. Our algorithm derives the optimal checkpoint count and intervals, and we evaluate its performance through extensive simulations and a case study. Results show a 99.97% and 67.86% reduction in execution time compared to checkpoint-free systems in simulations and the case study, respectively. Moreover, our proposed strategy outperforms prior work and baseline methods, increasing deadline achievement rates by 31.41% and 2.92% for small-scale tasks and 78.53% and 4.15% for large-scale tasks. 
    more » « less
  5. In machine learning, supervised classifiers are used to obtain predictions for unlabeled data by inferring prediction functions using labeled data. Supervised classifiers are widely applied in domains such as computational biology, computational physics and healthcare to make critical decisions. However, it is often hard to test supervised classifiers since the expected answers are unknown. This is commonly known as the oracle problem and metamorphic testing (MT) has been used to test such programs. In MT, metamorphic relations (MRs) are developed from intrinsic characteristics of the software under test (SUT). These MRs are used to generate test data and to verify the correctness of the test results without the presence of a test oracle. Effectiveness of MT heavily depends on the MRs used for testing. In this paper we have conducted an extensive empirical study to evaluate the fault detection effectiveness of MRs that have been used in multiple previous studies to test supervised classifiers. Our study uses a total of 709 reachable mutants generated by multiple mutation engines and uses data sets with varying characteristics to test the SUT. Our results reveal that only 14.8% of these mutants are detected using the MRs and that the fault detection effectiveness of these MRs do not scale with the increased number of mutants when compared to what was reported in previous studies. 
    more » « less