Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Android User Interface (UI) testing has emerged as an important and prevalent research topic due to the ubiquity of apps and the unique challenges faced by developers in this software domain. One popular topic of research that aims to facilitate both manual and automated UI testing and debugging processes is record and replay (R&R) tools. These tools allow for the recording of UI actions to facilitate the execution of test scenarios and the replay of various types of bugs. R&R tools typically support three main settings: (i) UI regression testing via R&R of feature-based execution scenarios, (ii) R&R of non- crashing functional bugs (e.g., in crowdsourced settings), and (iii) R&R of crashing bugs. Despite the progress made in research related to R&R tools, prior work examined only the effectiveness of these tools in disparate or fragmented settings. As such, the research community currently lacks a comprehensive examination of the effectiveness of existing tools across their common use cases and the potential key limitations that emerge. We address this current gap in knowledge by conducting a thorough empirical study on using R&R tools to manually record and replay feature-based user scenarios, non-crashing failures, and crashing bugs. Additionally, we explore the possibility of using R&R tools in conjunction with automated input genera- tion (AIG) tools to automatically record and replay crashing bugs. Our study context includes one industrial and three academic R&R tools, 34 user scenarios from 17 apps, 90 non-crashing failures from 42 Android apps, and 31 crashing bugs from 17 Android apps. Our results illustrate that 17% of user scenarios, 38% of non-crashing failures, and 44% of crashing bugs are not able to be reliably recorded and replayed, with the most prevalent reasons for non-replayability being action interval resolution, incompatibility related to APIs, and limitations in Android tooling. Our findings reveal important research directions for R&R tools to facilitate their practical application and adoption.more » « lessFree, publicly-accessible full text available April 4, 2026
-
Free, publicly-accessible full text available October 28, 2025
-
This artifact contains resources for reproducing and extending the work "The Effects of Computational Resources on Flaky Tests" Contents: Analysis and Processed Test Results.tgz: An archive that contains information about the projects analyzed, summarized test results per-throttling configuration per-run, and a Jupyter notebook that detects RAFT (generating all tables and figures in the article). A README in this archive provides further guidance on its contents js-results.tar, java-results.tar, python-results.tar: The raw results produced by the test runner when executing each JavaScript, Java and Python project 300 times in each of the throttling configurations. See also: We have published docker containers that include each project that we studied, along with all of the dependenices for running the tests. These containers can be used to reproduce our results, or to extend our work by running additional tests. The containers are available at https://hub.docker.com/r/jonbell/raft/tagsmore » « less
-
null (Ed.)Due to the importance of Android app quality assurance, many Android UI testing tools have been developed by researchers over the years. However, recent studies show that these tools typically achieve low code coverage on popular industrial apps. In fact, given a reasonable amount of run time, most state-of-the-art tools cannot even outperform a simple tool, Monkey, on popular industrial apps with large codebases and sophisticated functionalities. Our motivating study finds that these tools perform two types of operations, UI Hierarchy Capturing (capturing information about the contents on the screen) and UI Event Execution (executing UI events, such as clicks), often inefficiently using UIAutomator, a component of the Android framework. In total, these two types of operations use on average 70% of the given test time. Based on this finding, to improve the effectiveness of Android testing tools, we propose TOLLER, a tool consisting of infrastructure enhancements to the Android operating system. TOLLER injects itself into the same virtual machine as the app under test, giving TOLLER direct access to the app’s runtime memory. TOLLER is thus able to directly (1) access UI data structures, and thus capture contents on the screen without the overhead of invoking the Android framework services or remote procedure calls (RPCs), and (2) invoke UI event handlers without needing to execute the UI events. Compared with the often-used UIAutomator, TOLLER reduces average time usage of UI Hierarchy Capturing and UI Event Execution operations by up to 97% and 95%, respectively. We integrate TOLLER with existing state-of-the-art/practice Android UI testing tools and achieve the range of 11.8% to 70.1% relative code coverage improvement on average. We also find that TOLLER-enhanced tools are able to trigger 1.4x to 3.6x distinct crashes compared with their original versions without TOLLER enhancement. These improvements are so substantial that they also change the relative competitiveness of the tools under empirical comparison. Our findings highlight the practicality of TOLLER as well as raising the community awareness of infrastructure support’s significance beyond the community’s existing heavy focus on algorithms.more » « less
-
null (Ed.)Tests that modify (i.e., "pollute") the state shared among tests in a test suite are called \polluter tests". Finding these tests is im- portant because they could result in di erent test outcomes based on the order of the tests in the test suite. Prior work has proposed the PolDet technique for nding polluter tests in runs of JUnit tests on a regular Java Virtual Machine (JVM). Given that Java PathFinder (JPF) provides desirable infrastructure support, such as systematically exploring thread schedules, it is a worthwhile attempt to re-implement techniques such as PolDet in JPF. We present a new implementation of PolDet for nding polluter tests in runs of JUnit tests in JPF. We customize the existing state comparison in JPF to support the so-called \common-root iso- morphism" required by PolDet. We find that our implementation is simple, requiring only -200 lines of code, demonstrating that JPF is a sophisticated infrastructure for rapid exploration of re-search ideas on software testing. We evaluate our implementation on 187 test classes from 13 Java projects and nd 26 polluter tests. Our results show that the runtime overhead of PolDet@JPF com- pared to base JPF is relatively low, on average 1.43x. However, our experiments also show some potential challenges with JPF.more » « less