NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Need-finding Study for Understanding Text Entry in SmartphoneApp Usage

Li, Toby Jia-Jun; Myers, Brad A (November 2021, ArXivorg)
null (Ed.)
Text entry makes up about one-fourth of the smartphone interaction events, and is known to be challenging and difficult. However, there has been little study about the characteristics of text entry in the context of smartphone app usage. In this paper, we present a mixed-method in-situ study conducted in 2016 with 17 active smartphone users to better understand text entry in smartphone app usage. Our results show 80% of text was entered into communication apps, with different apps exhibiting distinct usage patterns. We found that structured data such as URLs and email addresses are rarely typed but instead are auto-completed or replaced with search, copy-and-paste is rarely used, and sessions of smartphone usage with text entry involve more apps and last longer. We conclude with a discussion about the implications on the development of systems to better support mobile interaction.
more » « less
Full Text Available
Demonstration + Natural Language: Multimodal Interfaces for GUI-based Interactive Task Learning Agents

https://doi.org/10.1007/978-3-030-82681-9_15

Li, Toby Jia-Jun; Mitchell, Tom M.; Myers, Brad A (May 2021, Artificial Intelligence for Human Computer Interaction: A Modern Approach)
Li, Yang; Hilliges, Otmar (Ed.)
We summarize our past five years of work on designing, building, and studying Sugilite, an interactive task learning agent that can learn new tasks and relevant associated concepts interactively from the user’s natural language instructions and demonstrations leveraging the graphical user interfaces (GUIs) of third-party mobile apps. Through its multi-modal and mixed-initiative approaches for Human-AI interaction, Sugilite made important contributions in improving the usability, applicability, generalizability, flexibility, robustness, and shareability of interactive task learning agents. Sugilite also represents a new human-AI interaction paradigm for interactive task learning, where it uses existing app GUIs as a medium for users to communicate their intents with an AI agent instead of the interfaces for users to interact with the underlying computing services. In this chapter, we describe the Sugilite system, explain the design and implementation of its key features, and show a prototype in the form of a conversational assistant on Android.
more » « less
Full Text Available
Screen2Vec: Semantic Embedding of GUI Screens and GUI Components

https://doi.org/10.1145/3411764.3445049

Li, Toby Jia-Jun; Popowski, Lindsay; Mitchell, Tom; Myers, Brad A (May 2021, Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2021))
null (Ed.)
Representing the semantics of GUI screens and components is crucial to data-driven computational methods for modeling user-GUI interactions and mining GUI designs. Existing GUI semantic representations are limited to encoding either the textual content, the visual design and layout patterns, or the app contexts. Many representation techniques also require significant manual data annotation efforts. This paper presents Screen2Vec, a new self-supervised technique for generating representations in embedding vectors of GUI screens and components that encode all of the above GUI features without requiring manual annotation using the context of user interaction traces. Screen2Vec is inspired by the word embedding method Word2Vec, but uses a new two-layer pipeline informed by the structure of GUIs and interaction traces and incorporates screen- and app-specific metadata. Through several sample downstream tasks, we demonstrate Screen2Vec’s key useful properties: representing between-screen similarity through nearest neighbors, composability, and capability to represent user tasks.
more » « less
Full Text Available
Multi-Modal Repairs of Conversational Breakdowns in Task-Oriented Dialogs

https://doi.org/10.1145/3379337.3415820

Li, Toby Jia-Jun; Chen, Jingya; Xia, Haijun; Mitchell, Tom M.; Myers, Brad A. (October 2020, ACM Symposium on User Interface Software and Technology)
null (Ed.)
A major problem in task-oriented conversational agents is the lack of support for the repair of conversational breakdowns. Prior studies have shown that current repair strategies for these kinds of errors are often ineffective due to: (1) the lack of transparency about the state of the system's understanding of the user's utterance; and (2) the system's limited capabilities to understand the user's verbal attempts to repair natural language understanding errors. This paper introduces SOVITE, a new multi-modal speech plus direct manipulation interface that helps users discover, identify the causes of, and recover from conversational breakdowns using the resources of existing mobile app GUIs for grounding. SOVITE displays the system's understanding of user intents using GUI screenshots, allows users to refer to third-party apps and their GUI screens in conversations as inputs for intent disambiguation, and enables users to repair breakdowns using direct manipulation on these screenshots. The results from a remote user study with 10 users using SOVITE in 7 scenarios suggested that SOVITE's approach is usable and effective.
more » « less
Full Text Available
Interactive Task Learning from GUI-Grounded Natural Language Instructions and Demonstrations

https://doi.org/10.18653/v1/2020.acl-demos.25

Li, Toby Jia-Jun; Mitchell, Tom; Myers, Brad (July 2020, The AAAI-20 Workshop on Intelligent Process Automation (IPA-20))
null (Ed.)
We summarize our past five years of work on designing, building, and studying Sugilite, an interactive task learning agent that can learn new tasks and relevant associated concepts interactively from the user’s natural language instructions and demonstrations leveraging the graphical user interfaces (GUIs) of third-party mobile apps. Through its multi-modal and mixed-initiative approaches for Human- AI interaction, Sugilite made important contributions in improving the usability, applicability, generalizability, flexibility, robustness, and shareability of interactive task learning agents. Sugilite also represents a new human-AI interaction paradigm for interactive task learning, where it uses existing app GUIs as a medium for users to communicate their intents with an AI agent instead of the interfaces for users to interact with the underlying computing services. In this chapter, we describe the Sugilite system, explain the design and implementation of its key features, and show a prototype in the form of a conversational assistant on Android.
more » « less
Full Text Available
Privacy-Preserving Script Sharing in GUI-based Programming-by-Demonstration Systems

https://doi.org/10.1145/3392869

Li, Toby Jia-Jun; Chen, Jingya; Canfield, Brandon; Myers, Brad A. (May 2020, Proceedings of the ACM on Human-Computer Interaction)

Full Text Available
Towards Effective Human-AI Collaboration in GUI-Based Interactive Task Learning Agents

https://doi.org/arXiv:2003.02622

Li, Toby Jia-Jun; Chen, Jingya; Mitchell, Tom; Myers, Brad (April 2020, CHI 2020 Workshop on Artificial Intelligence for HCI: A Modern Approach (AI4HCI))

We argue that a key challenge in enabling usable and useful interactive task learning for intelligent agents is to facilitate effective Human-AI collaboration. We reflect on our past 5 years of efforts on designing, developing and studying the SUGILITE system, discuss the issues on incorporating recent advances in AI with HCI principles in mixed-initiative interactions and multimodal interactions, and summarize the lessons we learned. Lastly, we identify several challenges and opportunities, and describe our ongoing work.
more » « less
Full Text Available
Interactive Task and Concept Learning from Natural Language Instructions and GUI Demonstrations

Li, Toby Jia-Jun; Radensky, Marissa; Jia, Justin; Singarajah, Kirielle; Mitchell, Tom M.; Myers, Brad A. (February 2020, The AAAI-20 Workshop on Intelligent Process Automation (IPA-20))

Natural language programming is a promising approach to enable end users to instruct new tasks for intelligent agents. However, our formative study found that end users would often use unclear, ambiguous or vague concepts when naturally instructing tasks in natural language, especially when specifying conditionals. Existing systems have limited support for letting the user teach agents new concepts or explaining unclear concepts. In this paper, we describe a new multimodal domain-independent approach that combines natural language programming and programming-by-demonstration to allow users to first naturally describe tasks and associated conditions at a high level, and then collaborate with the agent to recursively resolve any ambiguities or vagueness through conversations and demonstrations. Users can also define new procedures and concepts by demonstrating and referring to contents within GUIs of existing mobile apps. We demonstrate this approach in PUMICE, an end-user programmable agent that implements this approach. A lab study with 10 users showed its usability.
more » « less
Full Text Available
Interactive Task and Concept Learning from Natural Language Instructions and GUI Demonstrations

Li, Toby Jia-Jun; Radensky, Marissa; Jia, Justin; Singarajah, Kirielle; Mitchell, Tom M.; Myers, Brad A. (February 2020, The AAAI-20 Workshop on Intelligent Process Automation (IPA-20))

Natural language programming is a promising approach to enable end users to instruct new tasks for intelligent agents. However, our formative study found that end users would often use unclear, ambiguous or vague concepts when naturally instructing tasks in natural language, especially when specifying conditionals. Existing systems have limited support for letting the user teach agents new concepts or explaining unclear concepts. In this paper, we describe a new multimodal domain-independent approach that combines natural language programming and programming-by-demonstration to allow users to first naturally describe tasks and associated conditions at a high level, and then collaborate with the agent to recursively resolve any ambiguities or vagueness through conversations and demonstrations. Users can also define new procedures and concepts by demonstrating and referring to contents within GUIs of existing mobile apps. We demonstrate this approach in PUMICE, an end-user programmable agent that implements this approach. A lab study with 10 users showed its usability.
more » « less
Full Text Available
Look-up and Adapt: A One-shot Semantic Parser

https://doi.org/10.18653/v1/D19-1104

Lu, Zhichu; Arabshahi, Forough; Labutov, Igor; Mitchell, Tom (November 2019, 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP))

Computing devices have recently become capable of interacting with their end users via natural language. However, they can only operate within a limited “supported” domain of discourse and fail drastically when faced with an out-of-domain utterance, mainly due to the limitations of their semantic parser. In this paper, we propose a semantic parser that generalizes to out-of-domain examples by learning a general strategy for parsing an unseen utterance through adapting the logical forms of seen utterances, instead of learning to generate a logical form from scratch. Our parser maintains a memory consisting of a representative subset of the seen utterances paired with their logical forms. Given an unseen utterance, our parser works by looking up a similar utterance from the memory and adapting its logical form until it fits the unseen utterance. Moreover, we present a data generation strategy for constructing utterance-logical form pairs from different domains. Our results show an improvement of up to 68.8% on one-shot parsing under two different evaluation settings compared to the baselines.
more » « less
Full Text Available

« Prev Next »

Search for: All records