skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: UI Screens Identification and Extraction from Mobile Programming Screencasts
Mobile applications demand is on the rise, leading to more programmers learning to develop or having to maintain this kind of programs. Developers often refer to online resources to find inspiration or answers to questions they have about mobile programming topics and screencasts are a popular resource. However, given the multitude of screencasts available, it can be difficult to quickly comprehend which of the many videos is relevant to one's needs. We propose a novel approach, called UIScreens, which detects, extracts, and presents the most representative user interface (UI) screens embedded in mobile development screencasts. This could help developers quickly comprehend what an app displayed in a video is about, therefore saving time searching for useful videos. UIScreens has been evaluated in two empirical studies on iOS and Android programming screencasts. The first study investigates the accuracy of our UI extraction and shows that our approach is able to detect and extract UI screens with an accuracy of 94%. The second is a user study with mobile app developers, who evaluated both the accuracy and the usefulness of UIScreens. They agreed that UIScreens is accurate and extracts representative UI screens from videos. They considered that the extracted UI screens are useful for understanding what a video is about and if it is relevant to a search. Our approach has been implemented as a free online tool.  more » « less
Award ID(s):
1846142
PAR ID:
10220036
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 28th International Conference on Program Comprehension
Page Range / eLocation ID:
319 to 330
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Mobile apps are one of the most widely used types of software systems in existence today and more programmers and students learn how to develop them everyday. One of the most popular resources for learning mobile programming are videos hosted on social platforms such as YouTube. While useful, this type of resource has also its limitations, especially when developers are looking for user interface (UI) designs for mobile applications, since these are hard to search for and locate in videos. We propose UIScreens, a web-based analysis and search engine that analyzes the visual contents of mobile programming video tutorials, then identifies and extracts the UI screens displayed in the videos. Our tool offers features such as searching for UI screens in videos, displaying an overview of the UI screens identified in a video under each search result, and navigating to the part of a video where a particular UI screen is being displayed and discussed. In a user study, participants agreed that UIScreens is usable and useful to quickly skim through videos, while the UI screens it extracts can help developers further determine the relevance of videos to a search topic. 
    more » « less
  2. null (Ed.)
    The need for mobile applications and mobile programming is increasing due to the continuous rise in the pervasiveness of mobile devices. Developers often refer to video programming tutorials to learn more about mobile programming topics. To find the right video to watch, developers typically skim over several videos, looking at their title, description, and video content in order to determine if they are relevant to their information needs. Unfortunately, the title and description do not always provide an accurate overview, and skimming over videos is time-consuming and can lead to missing important information. We propose a novel approach that locates and extracts the GUI screens showcased in a video tutorial, then selects and displays the most representative ones to provide a GUI-focused overview of the video. We believe this overview can be used by developers as an additional source of information for determining if a video contains the information they need. To evaluate our approach, we performed an empirical study on iOS and Android programming screencasts which investigates the accuracy of our automated GUI extraction. The results reveal that our approach can detect and extract GUI screens with an accuracy of 94%. 
    more » « less
  3. Writing and maintaining UI tests for mobile apps is a time-consuming and tedious task. While decades of research have produced auto- mated approaches for UI test generation, these approaches typically focus on testing for crashes or maximizing code coverage. By contrast, recent research has shown that developers prefer usage-based tests, which center around specific uses of app features, to help support activities such as regression testing. Very few existing techniques support the generation of such tests, as doing so requires automating the difficult task of understanding the semantics of UI screens and user inputs. In this paper, we introduce Avgust, which automates key steps of generating usage-based tests. Avgust uses neural models for image understanding to process video recordings of app uses to synthesize an app-agnostic state-machine encoding of those uses. Then, Avgust uses this encoding to synthesize test cases for a new target app. We evaluate Avgust on 374 videos of common uses of 18 popular apps and show that 69% of the tests Avgust generates successfully execute the desired usage, and that Avgust’s classifiers outperform the state of the art. 
    more » « less
  4. When mobile apps are used extensively in our daily lives, their responsiveness has become an important factor that can negatively impact the user experience. The long response time of a mobile app can be caused by a variety of reasons, including soft hang bugs or prolonged user interface APIs (UI-APIs). While hang bugs have been researched extensively before, our investigation on UI-APIs in today’s mobile OS finds that the recursive construction of UI view hierarchy often can be time-consuming, due to the complexity of today’s UI views. To accelerate UI processing, such complex views can be pre-processed and cached before the user even visits them. However, pre-caching every view in a mobile app is infeasible due to the incurred overheads on time, energy, and cache space. In this paper, we propose MAPP, a framework for Mobile App Predictive Pre-caching. MAPP has two main modules, 1) UI view prediction based on deep learning and 2) UI-API pre-caching, which coordinate to improve the responsiveness of mobile apps. MAPP adopts a per-user and per-app prediction model that is tailored based on the analysis of collected user traces, such as location, time, or the sequence of previously visited views. A dynamic feature ranking and model selection algorithm is designed to judiciously filter out less relevant features for improving the prediction accuracy with less computation overhead. MAPP is evaluated with 61 real-world traces from 18 volunteers over 30 days to show that it can shorten the response time of mobile apps by 59.84% on average with an average cache hit rate of 92.55%. 
    more » « less
  5. null (Ed.)
    Programming screencasts can be a rich source of documentation for developers. However, despite the availability of such videos, the information available in them, and especially the source code being displayed is not easy to find, search, or reuse by programmers. Recent work has identified this challenge and proposed solutions that identify and extract source code from video tutorials in order to make it readily available to developers or other tools. A crucial component in these approaches is the Optical Character Recognition (OCR) engine used to transcribe the source code shown on screen. Previous work has simply chosen one OCR engine, without consideration for its accuracy or that of other engines on source code recognition. In this paper, we present an empirical study on the accuracy of six OCR engines for the extraction of source code from screencasts and code images. Our results show that the transcription accuracy varies greatly from one OCR engine to another and that the most widely chosen OCR engine in previous studies is by far not the best choice. We also show how other factors, such as font type and size can impact the results of some of the engines. We conclude by offering guidelines for programming screencast creators on which fonts to use to enable a better OCR recognition of their source code, as well as advice on OCR choice for researchers aiming to analyze source code in screencasts. 
    more » « less