skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Multi-Modal Repairs of Conversational Breakdowns in Task-Oriented Dialogs
A major problem in task-oriented conversational agents is the lack of support for the repair of conversational breakdowns. Prior studies have shown that current repair strategies for these kinds of errors are often ineffective due to: (1) the lack of transparency about the state of the system's understanding of the user's utterance; and (2) the system's limited capabilities to understand the user's verbal attempts to repair natural language understanding errors. This paper introduces SOVITE, a new multi-modal speech plus direct manipulation interface that helps users discover, identify the causes of, and recover from conversational breakdowns using the resources of existing mobile app GUIs for grounding. SOVITE displays the system's understanding of user intents using GUI screenshots, allows users to refer to third-party apps and their GUI screens in conversations as inputs for intent disambiguation, and enables users to repair breakdowns using direct manipulation on these screenshots. The results from a remote user study with 10 users using SOVITE in 7 scenarios suggested that SOVITE's approach is usable and effective.  more » « less
Award ID(s):
1814472
PAR ID:
10302227
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
ACM Symposium on User Interface Software and Technology
Page Range / eLocation ID:
1094 to 1107
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In software development, many documents (e.g., tutorials for tools and mobile application websites) contain screenshots of graphical user interfaces (GUIs) to illustrate functionalities. Although screenshots are critical in such documents, screenshots can become outdated, especially if document developers forget to update them. Outdated screenshots can mislead users and diminish the credibility of documentation. Identifying screenshots manually is tedious and error-prone, especially when documents are numerous. However, no existing tools are proposed to detect outdated screenshots in GUI documents. To mitigate manual efforts, we propose DOSUD, a novel approach for detecting outdated screenshots. It is challenging to identify outdated screenshots since the differences are subtle and only specific areas are useful to identify such screenshots. To address the challenges, DOSUD automatically extracts and labels screenshots and trains a classification model to identify outdated screenshots. As the first exploration, we focus on Android applications and the most popular IDE, VS Code. We evaluated DOSUD on a benchmark comprising 10 popular applications, achieving high F1-scores. When applied in the wild, DOSUD identified 20 outdated screenshots across 50 Android application websites and 17 outdated screenshots in VS Code documentation. VS Code developers have confirmed and fixed all our bug reports. 
    more » « less
  2. We investigate direct manipulation of graphical encodings as a method for interacting with visualizations. There is an increasing interest in developing visualization tools that enable users to perform operations by directly manipulating graphical encodings rather than external widgets such as checkboxes and sliders. Designers of such tools must decide which direct manipulation operations should be supported, and identify how each operation can be invoked. However, we lack empirical guidelines for how people convey their intended operations using direct manipulation of graphical encodings. We address this issue by conducting a qualitative study that examines how participants perform 15 operations using direct manipulation of standard graphical encodings. From this study, we 1) identify a list of strategies people employ to perform each operation, 2) observe commonalities in strategies across operations, and 3) derive implications to help designers leverage direct manipulation of graphical encoding as a method for user interaction. 
    more » « less
  3. null (Ed.)
    Embodied conversational agents (ECAs) provide an interface modality on smartphones that may be particularly effective for tasks with significant social, affective, reflective, and narrative aspects, such as health education and behavior change counseling. However, the conversational medium is significantly slower than conventional graphical user interfaces (GUIs) for brief, time-sensitive tasks. We conducted a randomized experiment to determine user preferences in performing two kinds of health-related tasks—one affective and narrative in nature and one transactional—and gave participants a choice of a conventional GUI or a functionally equivalent ECA on a smartphone to complete the task. We found significant main effects of task type and user preference on user choice of modality, with participants choosing the conventional GUI more often for transactional and time-sensitive tasks. 
    more » « less
  4. A search trail is an interactive visualization of how a previous searcher approached a related task. Using search trails to assist users requires understanding aspects of the task, user, and trails. In this paper, we examine two questions. First, what are task characteristics that influence a user's ability to gain benefits from others' trails? Second, what is the impact of a "mismatch" between a current user's task and previous user's task which originated the trail? We report on a study that investigated the influence of two factors on participants' perceptions and behaviors while using search trails to complete tasks. Our first factor, task scope, focused on the scope of the task assigned to the participant (broad to narrow). Our manipulation of this factor involved varying the number of constraints associated with tasks. Our second factor, trail scope, focused on the scope of the task that originated the search trails given to participants. We investigated how task scope and trail scope affected participants' (RQ1) pre-task perceptions, (RQ2) post-task perceptions, and (RQ3) search behaviors. We discuss implications of our results for systems that use search trails to provide assistance. 
    more » « less
  5. Insects maintain remarkable agility after incurring severe injuries or wounds. Although robots driven by rigid actuators have demonstrated agile locomotion and manipulation, most of them lack animal-like robustness against unexpected damage. Dielectric elastomer actuators (DEAs) are a class of muscle-like soft transducers that have enabled nimble aerial, terrestrial, and aquatic robotic locomotion comparable to that of rigid actuators. However, unlike muscles, DEAs suffer local dielectric breakdowns that often cause global device failure. These local defects severely limit DEA performance, lifetime, and size scalability. We developed DEAs that can endure more than 100 punctures while maintaining high bandwidth (>400 hertz) and power density (>700 watt per kilogram)—sufficient for supporting energetically expensive locomotion such as flight. We fabricated electroluminescent DEAs for visualizing electrode connectivity under actuator damage. When the DEA suffered severe dielectric breakdowns that caused device failure, we demonstrated a laser-assisted repair method for isolating the critical defects and recovering performance. These results culminate in an aerial robot that can endure critical actuator and wing damage while maintaining similar accuracy in hovering flight. Our work highlights that soft robotic systems can embody animal-like agility and resilience—a critical biomimetic capability for future robots to interact with challenging environments. 
    more » « less