skip to main content


Title: Action-Based Test Carving for Android Apps
The test suites of an Android app should take advantage of different types of tests including end-to-end tests, which validate user flows, and unit tests, which provide focused executions for debugging. App developers have two main options when creating unit tests: create unit tests that run on a device (either physical or emulated) or create unit tests that run on a development machine’s Java Virtual Machine (JVM). Unit tests that run on a device are not really focused, as they use the full implementation of the Android framework. Moreover, they are fairly slow to execute, requiring the Android system as the runtime. Unit tests that run on the JVM, instead, are more focused and run more efficiently but require developers to suitably handle the coupling between the app under test and the Android framework. To help developers in creating focused unit tests that run on the JVM, we propose a novel technique called ARTISAN based on the idea of test carving. The technique (i) traces the app execution during end-to-end testing on Android devices, (ii) identifies focal methods to test, (iii) carves the necessary preconditions for testing those methods, (iv) creates suitable test doubles for the Android framework, and (v) synthesizes executable unit tests that can run on the JVM. We evaluated ARTISAN using 152 end-to-end tests from five apps and observed that ARTISAN can generate unit tests that cover a significant portion of the code exercised by the end-to-end tests (i.e., 45% of the starting statement coverage on average) and does so in a few minutes.  more » « less
Award ID(s):
2150217
NSF-PAR ID:
10480452
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
IEEE
Date Published:
Page Range / eLocation ID:
107 to 116
Format(s):
Medium: X
Location:
Dublin, Ireland
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    With the increasing use of battery‐powered devices comes the need to test mobile applications for energy consumption and energy issues. Unfortunately, energy testing is expensive because it is a manual, labor‐intensive process that often requires multiple, separate, energy‐measuring devices to collect energy usage data. The high costs of energy testing can negatively affect the planning process of application evolution. For example, developers might be limited in the number of changes they can include in a release because they must conservatively plan to conduct energy testing after each change.

    In this paper, we present a new approach to provide developers with feedback on executing/skipping energy tests for proposed code changes. Our technique leverages change impact analysis and precomputed API energy usage information. More specifically, for a proposed change, the technique predicts whether energy testing will be required, and if so, which energy tests will need to be run. Such information may allow developers to avoid spending unnecessary time for energy testing and develop an effective application evolution timeline. To investigate the feasibility of our technique, we implemented a prototype for Android applications and conducted three case studies at different granularity levels on 10 Android applications.

     
    more » « less
  2. Regression testing - running available tests after each project change - is widely practiced in industry. Despite its widespread use and importance, regression testing is a costly activity. Regression test selection (RTS) optimizes regression testing by selecting only tests affected by project changes. RTS has been extensively studied and several tools have been deployed in large projects. However, work on RTS over the last decade has mostly focused on languages with abstract computing machines(e.g., JVM). Meanwhile development practices (e.g., frequency of commits, testing frameworks, compilers) in C++ projects have dramatically changed and the way we should design and implement RTS tools and the benefits of those tools is unknown. We present a design and implementation of an RTS technique, dubbed RTS++, that targets projects written in C++, which compile to LLVM IR and use the Google Test testing framework. RTS++ uses static analysis of a function call graph to select tests. RTS++ integrates with many existing build systems, including AutoMake, CMake, and Make. We evaluated RTS++ on 11 large open-source projects, totaling 3,811,916 lines of code. To the best of our knowledge, this is the largest evaluation of an RTS technique for C++. We measured the benefits of RTS++compared to running all available tests (i.e., retest-all). Our results show that RTS++ reduces the number of executed tests and end-to-end testing time by 88% and 61% on average. 
    more » « less
  3. Core features (functionalities) of an app can often be accessed and invoked in several ways, i.e., through alternative sequences of user-interface (UI) interactions. Given the manual effort of writing tests, developers often only consider the typical way of invoking features when creating the tests (i.e., the “sunny day scenario”). However, the alternative ways of invoking a feature are as likely to be faulty. These faults would go undetected without proper tests. To reduce the manual effort of creating UI tests and help developers more thoroughly examine the features of apps, we presentRoute, an automated tool for feature-based UI test augmentation for Android apps.Routefirst takes a UI test and the app under test as input. It then applies novel heuristics to find additional high-quality UI tests, consisting of both inputs and assertions, that verify the same feature as the original test in alternative ways. Application ofRouteon several dozen tests for popular apps on Google Play shows that for 96% of the existing tests,Routewas able to generate at least one alternative test. Moreover, the fault detection effectiveness of augmented test suites in our experiments showed substantial improvements of up to 39% over the original test suites.

     
    more » « less
  4. Android is a highly fragmented platform with a diverse set of devices and users. To support the deployment of apps in such a heterogeneous setting, Android has introduceddynamic delivery—a new model of software deployment in which optional, device- or user-specific functionalities of an app, calledDynamic Feature Modules (DFMs), can be installed, as needed, after the app’s initial installation. This model of app deployment, however, has exacerbated the challenges of properly testing Android apps. In this article, we first describe the results of an extensive study in which we formalized a defect model representing the various conditions under which DFM installations may fail. We then presentDeltaDroid—a tool aimed at assisting the developers with validating dynamic delivery behavior in their apps by augmenting their existing test suite. Our experimental evaluation using real-world apps corroboratesDeltaDroid’s ability to detect many crashes and unexpected behaviors that the existing automated testing tools cannot reveal.

     
    more » « less
  5. Unit testing focuses on verifying the functions of individual units of a software system. It is challenging due to the high inter dependencies among software units. Developers address this by mocking—replacing the dependency by a “fake” object. Despite the existence of powerful, dedicated mocking frameworks, developers often turn to a “hand-rolled” approach—inheritance. That is, they create a subclass of the dependent class and mock its behavior through method overriding. However, this requires tedious implementation and compromises the design quality of unit tests. This work contributes a fully automated refactoring framework to identify and replace the usage of inheritance by using Mockito—a well received mocking framework. Our approach is built upon the empirical experience from five open source projects that use inheritance for mocking. We evaluate our approach on nine other projects. Results show that our framework is efficient, generally applicable to new datasets, mostly preserves test case behaviors in detecting defects (in the form of mutants), and decouples test code from production code. The qualitative evaluation by experienced developers suggests that the auto-refactoring solutions generated by our framework improve the quality of the unit test cases in various aspects, such as making test conditions more explicit, as well as improved cohesion, readability, understandability, and maintainability with test cases. Finally, we submit 23 pull requests containing our refactoring solutions to the open-source projects. It turns out that, 9 requests are accepted/merged, 6 requests are rejected, the remaining requests are pending (5 requests), with unexpected exceptions (2 requests), or undecided (1 request). In particular, among the 21 open source developers that are involved in the reviewing process, 81% give positive votes. This indicates that our refactoring solutions are quite well received by the open-source projects and developers. 
    more » « less