skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Screen2Vec: Semantic Embedding of GUI Screens and GUI Components
Representing the semantics of GUI screens and components is crucial to data-driven computational methods for modeling user-GUI interactions and mining GUI designs. Existing GUI semantic representations are limited to encoding either the textual content, the visual design and layout patterns, or the app contexts. Many representation techniques also require significant manual data annotation efforts. This paper presents Screen2Vec, a new self-supervised technique for generating representations in embedding vectors of GUI screens and components that encode all of the above GUI features without requiring manual annotation using the context of user interaction traces. Screen2Vec is inspired by the word embedding method Word2Vec, but uses a new two-layer pipeline informed by the structure of GUIs and interaction traces and incorporates screen- and app-specific metadata. Through several sample downstream tasks, we demonstrate Screen2Vec’s key useful properties: representing between-screen similarity through nearest neighbors, composability, and capability to represent user tasks.  more » « less
Award ID(s):
1814472
PAR ID:
10302228
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2021)
Page Range / eLocation ID:
1 to 15
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Many computing tasks, such as comparison shopping, two-factor authentication, and checking movie reviews, require using multiple apps together. On large screens, "windows, icons, menus, pointer" (WIMP) graphical user interfaces (GUIs) support easy sharing of content and context between multiple apps. So, it is straightforward to see the content from one application and write something relevant in another application, such as looking at the map around a place and typing walking instructions into an email. However, although today's smartphones also use GUIs, they have small screens and limited windowing support, making it hard to switch contexts and exchange data between apps. We introduce DoThisHere, a multimodal interaction technique that streamlines cross-app tasks and reduces the burden these tasks impose on users. Users can use voice to refer to information or app features that are off-screen and touch to specify where the relevant information should be inserted or is displayed. With DoThisHere, users can access information from or carry information to other apps with less context switching. We conducted a survey to find out what cross-app tasks people are currently performing or wish to perform on their smartphones. Among the 125 tasks that we collected from 75 participants, we found that 59 of these tasks are not well supported currently. DoThisHere is helpful in completing 95% of these unsupported tasks. A user study, where users are shown the list of supported voice commands when performing a representative sample of such tasks, suggests that DoThisHere may reduce expert users' cognitive load; the Query action, in particular, can help users reduce task completion time. 
    more » « less
  2. null (Ed.)
    The need for mobile applications and mobile programming is increasing due to the continuous rise in the pervasiveness of mobile devices. Developers often refer to video programming tutorials to learn more about mobile programming topics. To find the right video to watch, developers typically skim over several videos, looking at their title, description, and video content in order to determine if they are relevant to their information needs. Unfortunately, the title and description do not always provide an accurate overview, and skimming over videos is time-consuming and can lead to missing important information. We propose a novel approach that locates and extracts the GUI screens showcased in a video tutorial, then selects and displays the most representative ones to provide a GUI-focused overview of the video. We believe this overview can be used by developers as an additional source of information for determining if a video contains the information they need. To evaluate our approach, we performed an empirical study on iOS and Android programming screencasts which investigates the accuracy of our automated GUI extraction. The results reveal that our approach can detect and extract GUI screens with an accuracy of 94%. 
    more » « less
  3. One of the most important tasks related to managing bug reports is localizing the fault so that a fix can be applied. As such, prior work has aimed to automate this task of bug localization by formulating it as an information retrieval problem, where potentially buggy files are retrieved and ranked according to their textual similarity with a given bug report. However, there is often a notable semantic gap between the information contained in bug reports and identifiers or natural language contained within source code files. For user-facing software, there is currently a key source of information that could aid in bug localization, but has not been thoroughly investigated - information from the GUI. We investigate the hypothesis that, for end user-facing applications, connecting information in a bug report with information from the GUI, and using this to aid in retrieving potentially buggy files, can improve upon existing techniques for bug localization. To examine this phenomenon, we conduct a comprehensive empirical study that augments four baseline techniques for bug localization with GUI interaction information from a reproduction scenario to (i) filter out potentially irrelevant files, (ii) boost potentially relevant files, and (iii) reformulate text-retrieval queries. To carry out our study, we source the current largest dataset of fully-localized and reproducible real bugs for Android apps, with corresponding bug reports, consisting of 80 bug reports from 39 popular open-source apps. Our results illustrate that augmenting traditional techniques with GUI information leads to a marked increase in effectiveness across multiple metrics, including a relative increase in Hits@10 of 13-18%. Additionally, through further analysis, we find that our studied augmentations largely complement existing techniques. 
    more » « less
  4. null (Ed.)
    We summarize our past five years of work on designing, building, and studying Sugilite, an interactive task learning agent that can learn new tasks and relevant associated concepts interactively from the user’s natural language instructions and demonstrations leveraging the graphical user interfaces (GUIs) of third-party mobile apps. Through its multi-modal and mixed-initiative approaches for Human- AI interaction, Sugilite made important contributions in improving the usability, applicability, generalizability, flexibility, robustness, and shareability of interactive task learning agents. Sugilite also represents a new human-AI interaction paradigm for interactive task learning, where it uses existing app GUIs as a medium for users to communicate their intents with an AI agent instead of the interfaces for users to interact with the underlying computing services. In this chapter, we describe the Sugilite system, explain the design and implementation of its key features, and show a prototype in the form of a conversational assistant on Android. 
    more » « less
  5. Li, Yang; Hilliges, Otmar (Ed.)
    We summarize our past five years of work on designing, building, and studying Sugilite, an interactive task learning agent that can learn new tasks and relevant associated concepts interactively from the user’s natural language instructions and demonstrations leveraging the graphical user interfaces (GUIs) of third-party mobile apps. Through its multi-modal and mixed-initiative approaches for Human-AI interaction, Sugilite made important contributions in improving the usability, applicability, generalizability, flexibility, robustness, and shareability of interactive task learning agents. Sugilite also represents a new human-AI interaction paradigm for interactive task learning, where it uses existing app GUIs as a medium for users to communicate their intents with an AI agent instead of the interfaces for users to interact with the underlying computing services. In this chapter, we describe the Sugilite system, explain the design and implementation of its key features, and show a prototype in the form of a conversational assistant on Android. 
    more » « less