NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Domain-Specific Fixes for Flaky Tests with Wrong Assumptions on Underdetermined Specifications

https://doi.org/10.1109/ICSE43902.2021.00018

Zhang, Peilun; Jiang, Yanjie; Wei, Anjiang; Stodden, Victoria; Marinov, Darko; Shi, August (May 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE))
null (Ed.)
Full Text Available
Understanding Reproducibility and Characteristics of Flaky Tests Through Test Reruns in Java Projects

https://doi.org/10.1109/ISSRE5003.2020.00045

Lam, Wing; Winter, Stefan; Astorga, Angello; Stodden, Victoria; Marinov, Darko (October 2020, 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE))
null (Ed.)
Full Text Available
Empirically revisiting and enhancing IR-based test-case prioritization

https://doi.org/10.1145/3395363.3397383

Peng, Qianyang; Shi, August; Zhang, Lingming (July 2020, ACM SIGSOFT International Symposium on Software Testing and Analysis)

Test-case prioritization (TCP) aims to detect regression bugs faster via reordering the tests run. While TCP has been studied for over 20 years, it was almost always evaluated using seeded faults/mutants as opposed to using real test failures. In this work, we study the recent change-aware information retrieval (IR) technique for TCP. Prior work has shown it performing better than traditional coverage-based TCP techniques, but it was only evaluated on a small-scale dataset with a cost-unaware metric based on seeded faults/mutants. We extend the prior work by conducting a much larger and more realistic evaluation as well as proposing enhancements that substantially improve the performance. In particular, we evaluate the original technique on a large-scale, real-world software-evolution dataset with real failures using both cost-aware and cost-unaware metrics under various configurations. Also, we design and evaluate hybrid techniques combining the IR features, historical test execution time, and test failure frequencies. Our results show that the change-aware IR technique outperforms stateof-the-art coverage-based techniques in this real-world setting, and our hybrid techniques improve even further upon the original IR technique. Moreover, we show that flaky tests have a substantial impact on evaluating the change-aware TCP techniques based on real test failures.
more » « less
Full Text Available
Detecting flaky tests in probabilistic and machine learning applications

https://doi.org/10.1145/3395363.3397366

Dutta, Saikat; Shi, August; Choudhary, Rutvik; Zhang, Zhekun; Jain, Aryaman; Misailovic, Sasa (July 2020, ACM International Symposium on Software Testing and Analysis (ISSTA 2020))

Full Text Available
Dependent-test-aware regression testing techniques

https://doi.org/10.1145/3395363.3397364

Lam, Wing; Shi, August; Oei, Reed; Zhang, Sai; Ernst, Michael D.; Xie, Tao (July 2020, ACM International Symposium on Software Testing and Analysis (ISSTA 2020))

Full Text Available
Building a Vision for Reproducibility in the Cyberinfrastructure Ecosystem: Leveraging Community Efforts

https://doi.org/10.14529/jsfi200106

Chapp, D.; Stodden, V.; Taufer, M. (March 2020, Supercomputing Frontiers and Innovations)

The scientific computing community has long taken a leadership role in understanding and assessing the relationship of reproducibility to cyberinfrastructure, ensuring that computational results - such as those from simulations - are "reproducible", that is, the same results are obtained when one re-uses the same input data, methods, software and analysis conditions. Starting almost a decade ago, the community has regularly published and advocated for advances in this area. In this article we trace this thinking and relate it to current national efforts, including the 2019 National Academies of Science, Engineering, and Medicine report on "Reproducibility and Replication in Science". To this end, this work considers high performance computing workflows that emphasize workflows combining traditional simulations (e.g. Molecular Dynamics simulations) with in situ analytics. We leverage an analysis of such workflows to (a) contextualize the 2019 National Academies of Science, Engineering, and Medicine report's recommendations in the HPC setting and (b) envision a path forward in the tradition of community driven approaches to reproducibility and the acceleration of science and discovery. The work also articulates avenues for future research at the intersection of transparency, reproducibility, and computational infrastructure that supports scientific discovery.
more » « less
Full Text Available
Understanding and Improving Regression Test Selection in Continuous Integration

https://doi.org/10.1109/ISSRE.2019.00031

Shi, August; Zhao, Peiyuan; Marinov, Darko (October 2019, 30th IEEE International Symposium on Software Reliability Engineering (ISSRE 2019))

Full Text Available
Scientific Tests and Continuous Integration Strategies to Enhance Reproducibility in the Scientific Software Context

https://doi.org/https://doi.org/10.1145/3322790.3330595

Krafczyk, Matthew; Shi, August; Bhaskar, Adhithya; Marinov, Darko; Stodden, Victoria (June 2019, 2nd International Workshop on Practical Reproducible Evaluation of Computer Systems (P-RECS’19))

Continuous integration (CI) is a well-established technique in commercial and open-source software projects, although not routinely used in scientific publishing. In the scientific software context, CI can serve two functions to increase reproducibility of scientific results: providing an established platform for testing the reproducibility of these results, and demonstrating to other scientists how the code and data generate the published results. We explore scientific software testing and CI strategies using two articles published in the areas of applied mathematics and computational physics. We discuss lessons learned from reproducing these articles as well as examine and discuss existing tests. We introduce the notion of a scientific test as one that produces computational results from a published article. We then consider full result reproduction within a CI environment. If authors find their work too time or resource intensive to easily adapt to a CI context, we recommend the inclusion of results from reduced versions of their work (e.g., run at lower resolution, with shorter time scales, with smaller data sets) alongside their primary results within their article. While these smaller versions may be less interesting scientifically, they can serve to verify that published code and data are working properly. We demonstrate such reduction tests on the two articles studied.
more » « less
Full Text Available
iDFlakies: A Framework for Detecting and Partially Classifying Flaky Tests

https://doi.org/10.1109/ICST.2019.00038

Lam, Wing; Oei, Reed; Shi, August; Marinov, Darko; Xie, Tao (April 2019, Proc. of the 12th IEEE International Conference on Software Testing, Verification and Validation (ICST 2019))

Regression testing is increasingly important with the wide use of continuous integration. A desirable requirement for regression testing is that a test failure reliably indicates a problem in the code under test and not a false alarm from the test code or the testing infrastructure. However, some test failures are unreliable, stemming from flaky tests that can non- deterministically pass or fail for the same code under test. There are many types of flaky tests, with order-dependent tests being a prominent type. To help advance research on flaky tests, we present (1) a framework, iDFlakies, to detect and partially classify flaky tests; (2) a dataset of flaky tests in open-source projects; and (3) a study with our dataset. iDFlakies automates experimentation with our tool for Maven-based Java projects. Using iDFlakies, we build a dataset of 422 flaky tests, with 50.5% order-dependent and 49.5% not. Our study of these flaky tests finds the prevalence of two types of flaky tests, probability of a test-suite run to have at least one failure due to flaky tests, and how different test reorderings affect the number of detected flaky tests. We envision that our work can spur research to alleviate the problem of flaky tests.
more » « less
Full Text Available
iFixFlakies: a framework for automatically fixing order-dependent flaky tests

https://doi.org/10.1145/3338906.3338925

Shi, August; Lam, Wing; Oei, Reed; Xie, Tao; Marinov, Darko (January 2019, 7th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering)

Full Text Available

Search for: All records