skip to main content


Title: Configurations in Android testing: They Matter
Android has rocketed to the top of the mobile market thanks in large part to its open source model. Vendors use Android for their devices for free, and companies make customizations to suit their needs. This has resulted in a myriad of configurations that are extant in the user space today. In this paper, we show that differences in configurations, if ignored, can lead to differences in test outputs and code coverage. Consequently, researchers who develop new testing techniques and evaluate them on only one or two configurations are missing a necessary dimension in their experiments and developers who ignore this may release buggy software. In a large study on 18 apps across 88 configurations, we show that only one of the 18 apps studied showed no variation at all. The rest showed variation in either, or both, code coverage and test results. 15% of the 2,000 plus test cases across all of the apps vary, and some of the variation is subtle, i.e. not just a test crash. Our results suggest that configurations in Android testing do matter and that developers need to test using configuration-aware techniques.  more » « less
Award ID(s):
1901543 1745775
NSF-PAR ID:
10097971
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 1st International Workshop on Advances in Mobile App Analysis - A-Mobile 2018
Page Range / eLocation ID:
1 to 6
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Writing and maintaining UI tests for mobile apps is a time-consuming and tedious task. While decades of research have produced auto- mated approaches for UI test generation, these approaches typically focus on testing for crashes or maximizing code coverage. By contrast, recent research has shown that developers prefer usage-based tests, which center around specific uses of app features, to help support activities such as regression testing. Very few existing techniques support the generation of such tests, as doing so requires automating the difficult task of understanding the semantics of UI screens and user inputs. In this paper, we introduce Avgust, which automates key steps of generating usage-based tests. Avgust uses neural models for image understanding to process video recordings of app uses to synthesize an app-agnostic state-machine encoding of those uses. Then, Avgust uses this encoding to synthesize test cases for a new target app. We evaluate Avgust on 374 videos of common uses of 18 popular apps and show that 69% of the tests Avgust generates successfully execute the desired usage, and that Avgust’s classifiers outperform the state of the art. 
    more » « less
  2. The test suites of an Android app should take advantage of different types of tests including end-to-end tests, which validate user flows, and unit tests, which provide focused executions for debugging. App developers have two main options when creating unit tests: create unit tests that run on a device (either physical or emulated) or create unit tests that run on a development machine’s Java Virtual Machine (JVM). Unit tests that run on a device are not really focused, as they use the full implementation of the Android framework. Moreover, they are fairly slow to execute, requiring the Android system as the runtime. Unit tests that run on the JVM, instead, are more focused and run more efficiently but require developers to suitably handle the coupling between the app under test and the Android framework. To help developers in creating focused unit tests that run on the JVM, we propose a novel technique called ARTISAN based on the idea of test carving. The technique (i) traces the app execution during end-to-end testing on Android devices, (ii) identifies focal methods to test, (iii) carves the necessary preconditions for testing those methods, (iv) creates suitable test doubles for the Android framework, and (v) synthesizes executable unit tests that can run on the JVM. We evaluated ARTISAN using 152 end-to-end tests from five apps and observed that ARTISAN can generate unit tests that cover a significant portion of the code exercised by the end-to-end tests (i.e., 45% of the starting statement coverage on average) and does so in a few minutes. 
    more » « less
  3. null (Ed.)
    Due to the importance of Android app quality assurance, many Android UI testing tools have been developed by researchers over the years. However, recent studies show that these tools typically achieve low code coverage on popular industrial apps. In fact, given a reasonable amount of run time, most state-of-the-art tools cannot even outperform a simple tool, Monkey, on popular industrial apps with large codebases and sophisticated functionalities. Our motivating study finds that these tools perform two types of operations, UI Hierarchy Capturing (capturing information about the contents on the screen) and UI Event Execution (executing UI events, such as clicks), often inefficiently using UIAutomator, a component of the Android framework. In total, these two types of operations use on average 70% of the given test time. Based on this finding, to improve the effectiveness of Android testing tools, we propose TOLLER, a tool consisting of infrastructure enhancements to the Android operating system. TOLLER injects itself into the same virtual machine as the app under test, giving TOLLER direct access to the app’s runtime memory. TOLLER is thus able to directly (1) access UI data structures, and thus capture contents on the screen without the overhead of invoking the Android framework services or remote procedure calls (RPCs), and (2) invoke UI event handlers without needing to execute the UI events. Compared with the often-used UIAutomator, TOLLER reduces average time usage of UI Hierarchy Capturing and UI Event Execution operations by up to 97% and 95%, respectively. We integrate TOLLER with existing state-of-the-art/practice Android UI testing tools and achieve the range of 11.8% to 70.1% relative code coverage improvement on average. We also find that TOLLER-enhanced tools are able to trigger 1.4x to 3.6x distinct crashes compared with their original versions without TOLLER enhancement. These improvements are so substantial that they also change the relative competitiveness of the tools under empirical comparison. Our findings highlight the practicality of TOLLER as well as raising the community awareness of infrastructure support’s significance beyond the community’s existing heavy focus on algorithms. 
    more » « less
  4. null (Ed.)
    Mobile devices are becoming a more common part of the education experience. Students can access their devices at any time to perform assignments or review material. Mobile apps can have the added advantage of being able to automatically grade student work and provide instantaneous feedback. However, numerous challenges remain in implementing effective mobile educational apps. One challenge is the small screen size of smartphones, which was a concern for a spatial visualization training app where students sketch isometric and orthographic drawings. This app was originally developed for iPads, but the wide prevalence of smartphones led to porting the software to iPhone and Android phones. The sketching assignments on a smartphone screen required more frequent zooming and panning, and one of the hypotheses of this study was that the educational effectiveness on smartphones was the same as on the larger screen sizes using iPad tablets. The spatial visualization mobile sketching app was implemented in a college freshman engineering graphics course to teach students how to sketch orthographic and isometric assignments. The app provides automatic grading and hint feedback to help students when they are stuck. Students in this pilot were assigned sketching problems as homework using their personal devices. Students were administered a pre- and post- spatial visualization test (PSVT-R, a reliable, well-validated instrument) to assess learning gains. The trial analysis focuses on students who entered the course with limited spatial visualization experience as identified based on a score of ≤70% on the PSVT:R since students entering college with low PSVT:R scores are at higher risk of dropping out of STEM majors. Among these low-performing students, those who used the app showed significant progress: (71%) raised their test scores above 70% bringing them out of the at-risk range for dropping out of engineering. While the PSVT:R test has been well validated, there are benefits to developing alternative methods of assessing spatial visualization skills. We developed an assembly pre- and post- test based upon a timed Lego™ exercise. At the start of the quarter, students were timed to see how long it would take them to build small lego sets using only visual instructions. Students were timed again on a different lego set after completion of the spatial visualization app. One benefit of the test was that it illustrated to the engineering students a skill that could be perceived as more relevant to their careers, and thus possibly increased their motivation for spatial visualization training. In addition, it may be possible to adapt the assembly test to elementary school grade levels where the PSVT:R test would not be suitable. Preliminary results show that the average lego build times decreased significantly after using the mobile app, indicating an improvement in students’ spatial reasoning skills. A comparison will also be done between normalized completion times on the assembly test and the PSVT:R tests in order to see how the assembly test compares to the “gold standard”. In addition to the PSVT-R instrument, a survey was conducted to evaluate student usage and their impressions of the app. Students found the app engaging, easy to use, and something they would do whenever they had “a free moment”. 95% of the students recommended the app to a friend if they are struggling with spatial visualization skills. This paper will describe the implementation of the mobile spatial visualization sketching app in a large college classroom, and highlight the app’s impact in increasing self-efficacy in spatial visualization and sketching 
    more » « less
  5. null (Ed.)
    Despite over a decade of research, it is still challenging for mobile UI testing tools to achieve satisfactory effectiveness, especially on industrial apps with rich features and large code bases. Our experiences suggest that existing mobile UI testing tools are prone to exploration tarpits, where the tools get stuck with a small fraction of app functionalities for an extensive amount of time. For example, a tool logs out an app at early stages without being able to log back in, and since then the tool gets stuck with exploring the app's pre-login functionalities (i.e., exploration tarpits) instead of its main functionalities. While tool vendors/users can manually hardcode rules for the tools to avoid specific exploration tarpits, these rules can hardly generalize, being fragile in face of diverted testing environments and fast app iterations. To identify and resolve exploration tarpits, we propose VET, a general approach including a supporting system for the given specific Android UI testing tool on the given specific app under test (AUT). VET runs the tool on the AUT for some time and records UI traces, based on which VET identifies exploration tarpits by recognizing their patterns in the UI traces. VET then pinpoints the actions (e.g., clicking logout) or the screens that lead to or exhibit exploration tarpits. In subsequent test runs, VET guides the testing tool to prevent or recover from exploration tarpits. From our evaluation with state-of-the-art Android UI testing tools on popular industrial apps, VET identifies exploration tarpits that cost up to 98.6% testing time budget. These exploration tarpits reveal not only limitations in UI exploration strategies but also defects in tool implementations. VET automatically addresses the identified exploration tarpits, enabling each evaluated tool to achieve higher code coverage and improve crash-triggering capabilities. 
    more » « less