skip to main content


This content will become publicly available on July 31, 2024

Title: Route : Roads Not Taken in UI Testing

Core features (functionalities) of an app can often be accessed and invoked in several ways, i.e., through alternative sequences of user-interface (UI) interactions. Given the manual effort of writing tests, developers often only consider the typical way of invoking features when creating the tests (i.e., the “sunny day scenario”). However, the alternative ways of invoking a feature are as likely to be faulty. These faults would go undetected without proper tests. To reduce the manual effort of creating UI tests and help developers more thoroughly examine the features of apps, we presentRoute, an automated tool for feature-based UI test augmentation for Android apps.Routefirst takes a UI test and the app under test as input. It then applies novel heuristics to find additional high-quality UI tests, consisting of both inputs and assertions, that verify the same feature as the original test in alternative ways. Application ofRouteon several dozen tests for popular apps on Google Play shows that for 96% of the existing tests,Routewas able to generate at least one alternative test. Moreover, the fault detection effectiveness of augmented test suites in our experiments showed substantial improvements of up to 39% over the original test suites.

 
more » « less
Award ID(s):
2107125 2106306
NSF-PAR ID:
10468355
Author(s) / Creator(s):
; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
ACM Transactions on Software Engineering and Methodology
Volume:
32
Issue:
3
ISSN:
1049-331X
Page Range / eLocation ID:
1 to 25
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The test suites of an Android app should take advantage of different types of tests including end-to-end tests, which validate user flows, and unit tests, which provide focused executions for debugging. App developers have two main options when creating unit tests: create unit tests that run on a device (either physical or emulated) or create unit tests that run on a development machine’s Java Virtual Machine (JVM). Unit tests that run on a device are not really focused, as they use the full implementation of the Android framework. Moreover, they are fairly slow to execute, requiring the Android system as the runtime. Unit tests that run on the JVM, instead, are more focused and run more efficiently but require developers to suitably handle the coupling between the app under test and the Android framework. To help developers in creating focused unit tests that run on the JVM, we propose a novel technique called ARTISAN based on the idea of test carving. The technique (i) traces the app execution during end-to-end testing on Android devices, (ii) identifies focal methods to test, (iii) carves the necessary preconditions for testing those methods, (iv) creates suitable test doubles for the Android framework, and (v) synthesizes executable unit tests that can run on the JVM. We evaluated ARTISAN using 152 end-to-end tests from five apps and observed that ARTISAN can generate unit tests that cover a significant portion of the code exercised by the end-to-end tests (i.e., 45% of the starting statement coverage on average) and does so in a few minutes. 
    more » « less
  2. null (Ed.)
    Due to the importance of Android app quality assurance, many Android UI testing tools have been developed by researchers over the years. However, recent studies show that these tools typically achieve low code coverage on popular industrial apps. In fact, given a reasonable amount of run time, most state-of-the-art tools cannot even outperform a simple tool, Monkey, on popular industrial apps with large codebases and sophisticated functionalities. Our motivating study finds that these tools perform two types of operations, UI Hierarchy Capturing (capturing information about the contents on the screen) and UI Event Execution (executing UI events, such as clicks), often inefficiently using UIAutomator, a component of the Android framework. In total, these two types of operations use on average 70% of the given test time. Based on this finding, to improve the effectiveness of Android testing tools, we propose TOLLER, a tool consisting of infrastructure enhancements to the Android operating system. TOLLER injects itself into the same virtual machine as the app under test, giving TOLLER direct access to the app’s runtime memory. TOLLER is thus able to directly (1) access UI data structures, and thus capture contents on the screen without the overhead of invoking the Android framework services or remote procedure calls (RPCs), and (2) invoke UI event handlers without needing to execute the UI events. Compared with the often-used UIAutomator, TOLLER reduces average time usage of UI Hierarchy Capturing and UI Event Execution operations by up to 97% and 95%, respectively. We integrate TOLLER with existing state-of-the-art/practice Android UI testing tools and achieve the range of 11.8% to 70.1% relative code coverage improvement on average. We also find that TOLLER-enhanced tools are able to trigger 1.4x to 3.6x distinct crashes compared with their original versions without TOLLER enhancement. These improvements are so substantial that they also change the relative competitiveness of the tools under empirical comparison. Our findings highlight the practicality of TOLLER as well as raising the community awareness of infrastructure support’s significance beyond the community’s existing heavy focus on algorithms. 
    more » « less
  3. Writing and maintaining UI tests for mobile apps is a time-consuming and tedious task. While decades of research have produced auto- mated approaches for UI test generation, these approaches typically focus on testing for crashes or maximizing code coverage. By contrast, recent research has shown that developers prefer usage-based tests, which center around specific uses of app features, to help support activities such as regression testing. Very few existing techniques support the generation of such tests, as doing so requires automating the difficult task of understanding the semantics of UI screens and user inputs. In this paper, we introduce Avgust, which automates key steps of generating usage-based tests. Avgust uses neural models for image understanding to process video recordings of app uses to synthesize an app-agnostic state-machine encoding of those uses. Then, Avgust uses this encoding to synthesize test cases for a new target app. We evaluate Avgust on 374 videos of common uses of 18 popular apps and show that 69% of the tests Avgust generates successfully execute the desired usage, and that Avgust’s classifiers outperform the state of the art. 
    more » « less
  4. Android is a highly fragmented platform with a diverse set of devices and users. To support the deployment of apps in such a heterogeneous setting, Android has introduceddynamic delivery—a new model of software deployment in which optional, device- or user-specific functionalities of an app, calledDynamic Feature Modules (DFMs), can be installed, as needed, after the app’s initial installation. This model of app deployment, however, has exacerbated the challenges of properly testing Android apps. In this article, we first describe the results of an extensive study in which we formalized a defect model representing the various conditions under which DFM installations may fail. We then presentDeltaDroid—a tool aimed at assisting the developers with validating dynamic delivery behavior in their apps by augmenting their existing test suite. Our experimental evaluation using real-world apps corroboratesDeltaDroid’s ability to detect many crashes and unexpected behaviors that the existing automated testing tools cannot reveal.

     
    more » « less
  5. Background While there are thousands of behavioral health apps available to consumers, users often quickly discontinue their use, which limits their therapeutic value. By varying the types and number of ways that users can interact with behavioral health mobile health apps, developers may be able to support greater therapeutic engagement and increase app stickiness. Objective The main objective of this analysis was to systematically characterize the types of user interactions that are available in behavioral health apps and then examine if greater interactivity was associated with greater user satisfaction, as measured by app metrics. Methods Using a modified PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) methodology, we searched several different app clearinghouse websites and identified 76 behavioral health apps that included some type of interactivity. We then filtered the results to ensure we were examining behavioral health apps and further refined our search to include apps that identified one or more of the following terms: peer or therapist forum, discussion, feedback, professional, licensed, buddy, friend, artificial intelligence, chatbot, counselor, therapist, provider, mentor, bot, coach, message, comment, chat room, community, games, care team, connect, share, and support in the app descriptions. In the final group of 34 apps, we examined the presence of 6 types of human-machine interactivities: human-to-human with peers, human-to-human with providers, human-to–artificial intelligence, human-to-algorithms, human-to-data, and novel interactive smartphone modalities. We also downloaded information on app user ratings and visibility, as well as reviewed other key app features. Results We found that on average, the 34 apps reviewed included 2.53 (SD 1.05; range 1-5) features of interactivity. The most common types of interactivities were human-to-data (n=34, 100%), followed by human-to-algorithm (n=15, 44.2%). The least common type of interactivity was human–artificial intelligence (n=7, 20.5%). There were no significant associations between the total number of app interactivity features and user ratings or app visibility. We found that a full range of therapeutic interactivity features were not used in behavioral health apps. Conclusions Ideally, app developers would do well to include more interactivity features in behavioral health apps in order to fully use the capabilities of smartphone technologies and increase app stickiness. Theoretically, increased user engagement would occur by using multiple types of user interactivity, thereby maximizing the benefits that a person would receive when using a mobile health app. 
    more » « less