skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Finding Polluter Tests Using Java PathFinder
Tests that modify (i.e., "pollute") the state shared among tests in a test suite are called \polluter tests". Finding these tests is im- portant because they could result in di erent test outcomes based on the order of the tests in the test suite. Prior work has proposed the PolDet technique for nding polluter tests in runs of JUnit tests on a regular Java Virtual Machine (JVM). Given that Java PathFinder (JPF) provides desirable infrastructure support, such as systematically exploring thread schedules, it is a worthwhile attempt to re-implement techniques such as PolDet in JPF. We present a new implementation of PolDet for nding polluter tests in runs of JUnit tests in JPF. We customize the existing state comparison in JPF to support the so-called \common-root iso- morphism" required by PolDet. We find that our implementation is simple, requiring only -200 lines of code, demonstrating that JPF is a sophisticated infrastructure for rapid exploration of re-search ideas on software testing. We evaluate our implementation on 187 test classes from 13 Java projects and nd 26 polluter tests. Our results show that the runtime overhead of PolDet@JPF com- pared to base JPF is relatively low, on average 1.43x. However, our experiments also show some potential challenges with JPF.  more » « less
Award ID(s):
1816615
PAR ID:
10299902
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
ACM SIGSOFT Software Engineering Notes
Volume:
46
Issue:
3
ISSN:
0163-5948
Page Range / eLocation ID:
37 to 41
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The compilation scheme for Volatile accesses in the OpenJDK 9 HotSpot Java Virtual Machine has a major problem that persists despite a recent bug report and a long discussion. One of the suggested fixes is to let Java compile Volatile accesses in the same way as C/C++11. However, we show that this approach is invalid for Java. Indeed, we show a set of optimizations that is valid for C/C++11 but invalid for Java, while the compilation scheme is similar. We prove the correctness of the compilation scheme to Power and x86 and a suite of valid optimizations in Java. Our proofs are based on a language model that we validate by proving key properties such as the DRF-SC theorem and by running litmus tests via our implementation of Java in Herd7. 
    more » « less
  2. Mutation testing is widely used in research as a metric for evaluating the quality of test suites. Mutation testing runs the test suite on generated mutants (variants of the code under test), where a test suite kills a mutant if any of the tests fail when run on the mutant. Mutation testing implicitly assumes that tests exhibit deterministic behavior, in terms of their coverage and the outcome of a test (not) killing a certain mutant. Such an assumption does not hold in the presence of flaky tests, whose outcomes can non-deterministically differ even when run on the same code under test. Without reliable test outcomes, mutation testing can result in unreliable results, e.g., in our experiments, mutation scores vary by four percentage points on average between repeated executions, and 9% of mutant-test pairs have an unknown status. Many modern software projects suffer from flaky tests. We propose techniques that manage flakiness throughout the mutation testing process, largely based on strategically re-running tests. We implement our techniques by modifying the open-source mutation testing tool, PIT. Our evaluation on 30 projects shows that our techniques reduce the number of "unknown" (flaky) mutants by 79.4%. 
    more » « less
  3. Regression testing is increasingly important with the wide use of continuous integration. A desirable requirement for regression testing is that a test failure reliably indicates a problem in the code under test and not a false alarm from the test code or the testing infrastructure. However, some test failures are unreliable, stemming from flaky tests that can non- deterministically pass or fail for the same code under test. There are many types of flaky tests, with order-dependent tests being a prominent type. To help advance research on flaky tests, we present (1) a framework, iDFlakies, to detect and partially classify flaky tests; (2) a dataset of flaky tests in open-source projects; and (3) a study with our dataset. iDFlakies automates experimentation with our tool for Maven-based Java projects. Using iDFlakies, we build a dataset of 422 flaky tests, with 50.5% order-dependent and 49.5% not. Our study of these flaky tests finds the prevalence of two types of flaky tests, probability of a test-suite run to have at least one failure due to flaky tests, and how different test reorderings affect the number of detected flaky tests. We envision that our work can spur research to alleviate the problem of flaky tests. 
    more » « less
  4. We present the design and implementation of GVM, the first system for executing Java bytecode entirely on GPUs. GVM is ideal for applications that execute a large number of short-living tasks, which share a significant fraction of their codebase and have similar execution time. GVM uses novel algorithms, scheduling, and data layout techniques to adapt to the massively parallel programming and execution model of GPUs. We apply GVM to generate and execute tests for Java projects. First, we implement a sequence-based test generation on top of GVM and design novel algorithms to avoid redundant test sequences. Second, we use GVM to execute randomly generated test cases. We evaluate GVM by comparing it with two existing Java bytecode interpreters (Oracle JVM and Java Pathfinder), as well as with the Oracle JVM with just-in-time (JIT) compiler, which has been engineered and optimized for over twenty years. Our evaluation shows that sequence-based test generation on GVM outperforms both Java Pathfinder and Oracle JVM interpreter. Additionally, our results show that GVM performs as well as running our parallel sequence-based test generation algorithm using JVM with JIT with many CPU threads. Furthermore, our evaluation on several classes from open-source projects shows that executing randomly generated tests on GVM outperforms sequential execution on JVM interpreter and JVM with JIT. 
    more » « less
  5. When developers make changes to their code, they typically run regression tests to detect if their recent changes (re)introduce any bugs. However, many tests are flaky, and their outcomes can change non-deterministically, failing without apparent cause. Flaky tests are a significant nuisance in the development process, since they make it more difficult for developers to trust the outcome of their tests. The traditional approach to identify flaky tests is to rerun them multiple times: if a test is observed both passing and failing on the same code, it is definitely flaky. We conducted a very large empirical study looking for flaky tests by rerunning the test suites of 24 projects 10,000 times each, and found that even with this many reruns, some flaky tests were still not detected. We propose FlakeFlagger, a novel approach that collects a set of features describing the behavior of each test, and then predicts tests that are likely to be flaky based on similar behavioral features. We found that FlakeFlagger correctly labeled at least as many tests as flaky as a state-of-the-art flaky test classifier, but that FlakeFlagger reported far fewer false positives (an increase in precision from just 11% to 60%). This lower false positive rate translates directly to saved time for researchers and developers who use the classification result to guide more expensive flaky test detection processes. By investigating the information gain of each feature, we conclude that test execution time, overall test coverage, coverage of recently changed lines and usage of third party libraries are effective predictors of test flakiness. We did not find any keywords or tokens in the source code of tests that were effective in predicting test flakiness, and did not find the presence of test smells to be effective in predicting test flakiness. This archive contains the dataset that we collected of flaky tests, along with the features that we collected from each test. Contents: Project_Info.csv: List of projects and their revisions studied build-logs-<project-slug>.tgz: An archive of all of the maven build logs from each of the 10,000 runs of that project's test suite.  failing-test-reports-<project-slug>.tgz An archive of all of the surefire XML reports for each failing test of each build of each project. test_results.csv: Summary of the number of passing and failing runs for each test in each project.  "Run ID" is a key into the <project-slug>.tgz archive also in this artifact, which refers to the run that we observed the test fail on. test_features.csv: Summary of the features that each test had, as per our feature detectors described in the paper flakeflagger-code.zip: All scripts used to generate and process these results. These scripts are also located at https://github.com/AlshammariA/FlakeFlagger 
    more » « less