Automatically Reproducing Timing-Dependent Flaky-Test Failures

Shanto Rahman, Aaron Massey

doi:10.5281/zenodo.7041599

This artifact contains the source code for FlakeRake, a tool for automatically reproducing timing-dependent flaky-test failures. It also includes raw and processed results produced in the evaluation of FlakeRake Contents: Timing-related APIs that FlakeRake considers adding sleeps at: timing-related-apis Anonymized code for FlakeRake (not runnable in its anonymized state, but included for reference; we will publicly release the non-anonymized code under an open source license pending double-blind review): flakerake.tgz Failure messages extracted from the FlakeFlagger dataset: 10k_reruns_failures_by_test.csv.gz Output from running isolated reruns on each flaky test in the FlakeFlager dataset: 10k_isolated_reruns_all_results.csv.gz (all test results summarized into a CSV), 10k_isolated_reruns_failures_by_test.csv.gz (CSV including just test failures, including failure messages), 10k_isolated_reruns_raw_results.tgz (includes all raw results from reruns, including the XML files output by maven) Output from running the FlakeFlagger replication study (non-isolated 10k reruns):flakeFlaggerReplResults.csv.gz (all test results summarized into a CSV), 10k_reruns_failures_by_test.csv.gz (CSV including just failures, including failure messages), flakeFlaggerRepl_raw_results.tgz (includes all raw results from reruns, including the XML files output by maven - this file is markedly larger than the 10k isolated reruns results because we ran *all* tests in this experiment, whereas the 10k isolated rerun experiment only re-ran the tests that were known to be flaky from the FlakeFlagger dataset). Output from running FlakeRake on each flaky test in the FlakeFlagger dataset: For bisection mode: results-bis.tgz For one-by-one mode: results-obo.tgz Scripts used to execute FlakeRake using an HPC cluster: execution-scripts.tgz Scripts used to execute rerun experiments using an HPC cluster: flakeFlaggerReplScripts.tgz Scripts used to parse the "raw" maven test result XML files in this artifact into the CSV files contained in this artifact: parseSurefireXMLs.tgz Output from running FlakeRake in “reproduction” mode, attempting to reproduce each of the failures that matched the FlakeFlagger dataset (collected for bisection mode only): results-repro-bis.tgz Analysis of timing-dependent API calls in the failure inducing configurations that matched FlakeFlagger failures: bis-sleepyline.cause-to-matched-fail-configs-found.csv

More Like this