NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

FlakeFlagger: Predicting Flakiness Without Rerunning Tests

https://doi.org/10.1109/ICSE43902.2021.00140

Alshammari, Abdulrahman; Morris, Christopher; Hilton, Michael; Bell, Jonathan (May 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE))
null (Ed.)
Full Text Available
A large-scale longitudinal study of flaky tests

https://doi.org/10.1145/3428270

Lam, Wing; Winter, Stefan; Wei, Anjiang; Xie, Tao; Marinov, Darko; Bell, Jonathan (November 2020, Proceedings of the ACM on Programming Languages)

Flaky tests are tests that can non-deterministically pass or fail for the same code version. These tests undermine regression testing efficiency, because developers cannot easily identify whether a test fails due to their recent changes or due to flakiness. Ideally, one would detect flaky tests right when flakiness is introduced, so that developers can then immediately remove the flakiness. Some software organizations, e.g., Mozilla and Netflix, run some tools—detectors—to detect flaky tests as soon as possible. However, detecting flaky tests is costly due to their inherent non-determinism, so even state-of-the-art detectors are often impractical to be used on all tests for each project change. To combat the high cost of applying detectors, these organizations typically run a detector solely on newly added or directly modified tests, i.e., not on unmodified tests or when other changes occur (including changes to the test suite, the code under test, and library dependencies). However, it is unclear how many flaky tests can be detected or missed by applying detectors in only these limited circumstances. To better understand this problem, we conduct a large-scale longitudinal study of flaky tests to determine when flaky tests become flaky and what changes cause them to become flaky. We apply two state-of-theart detectors to 55 Java projects, identifying a total of 245 flaky tests that can be compiled and run in the code version where each test was added. We find that 75% of flaky tests (184 out of 245) are flaky when added, indicating substantial potential value for developers to run detectors specifically on newly added tests. However, running detectors solely on newly added tests would still miss detecting 25% of flaky tests. The percentage of flaky tests that can be detected does increase to 85% when detectors are run on newly added or directly modified tests. The remaining 15% of flaky tests become flaky due to other changes and can be detected only when detectors are always applied to all tests. Our study is the first to empirically evaluate when tests become flaky and to recommend guidelines for applying detectors in the future.
more » « less
Revealing Injection Vulnerabilities by Leveraging Existing Tests

https://doi.org/10.1145/3377811.3380326

Hough, Katherin; Welearegai, Gere; Hammer, Christian; Bell, Jonathan (May 2020, Proceedings of the International Conference on Software Engineering)

Full Text Available
Mitigating the effects of flaky tests on mutation testing

https://doi.org/10.1145/3293882.3330568

Shi, August; Bell, Jonathan; Marinov, Darko (July 2019, ISSTA 2019 Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis)

Mutation testing is widely used in research as a metric for evaluating the quality of test suites. Mutation testing runs the test suite on generated mutants (variants of the code under test), where a test suite kills a mutant if any of the tests fail when run on the mutant. Mutation testing implicitly assumes that tests exhibit deterministic behavior, in terms of their coverage and the outcome of a test (not) killing a certain mutant. Such an assumption does not hold in the presence of flaky tests, whose outcomes can non-deterministically differ even when run on the same code under test. Without reliable test outcomes, mutation testing can result in unreliable results, e.g., in our experiments, mutation scores vary by four percentage points on average between repeated executions, and 9% of mutant-test pairs have an unknown status. Many modern software projects suffer from flaky tests. We propose techniques that manage flakiness throughout the mutation testing process, largely based on strategically re-running tests. We implement our techniques by modifying the open-source mutation testing tool, PIT. Our evaluation on 30 projects shows that our techniques reduce the number of "unknown" (flaky) mutants by 79.4%.
more » « less
Full Text Available

Search for: All records