skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Large Inclusive Study of Human Listening Rates
As conversational agents and digital assistants become increasingly pervasive, understanding their synthetic speech becomes increasingly important. Simultaneously, speech synthesis is becoming more sophisticated and manipulable, providing the opportunity to optimize speech rate to save users time. However, little is known about people’s abilities to understand fast speech. In this work, we provide the first large-scale study on human listening rates. Run on LabintheWild, it used volunteer participants, was screen reader accessible, and measured listening rate by accuracy at answering questions spoken by a screen reader at various rates. Our results show that blind and low-vision people, who often rely on audio cues and access text aurally, generally have higher listening rates than sighted people. The findings also suggest a need to expand the range of rates available on personal devices. These results demonstrate the potential for users to learn to listen to faster rates, expanding the possibilities for human-conversational agent interaction.  more » « less
Award ID(s):
1702751
PAR ID:
10063856
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI 2018)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Collaborative writing tools have been used widely in professional and academic organizations for many years. Yet, there has not been much work to improve screen reader access in mainstream collaborative writing tools. This severely affects the way people with vision impairments collaborate in ability-diverse teams. As a step toward addressing this issue, the present article aims at improving screen reader representation of collaborative features such as comments and track changes (i.e., suggested edits). Building on our formative interviews with 20 academics and professionals with vision impairments, we developed auditory representations that indicate comments and edits using non-speech audio (e.g., earcons, tone overlay), multiple text-to-speech voices, and contextual presentation techniques. We then performed a systematic evaluation study with 48 screen reader users that indicated that non-speech audio, changing voices, and contextual presentation can potentially improve writers’ collaboration awareness. We discuss implications of these results for the design of accessible collaborative systems. 
    more » « less
  2. Backchanneling behaviors on a robot, such as nodding, can make talking to a robot feel more natural and engaging by giving a sense that the robot is actively listening. For backchanneling to be effective, it is important that the timing of such cues is appropriate given the humans’ conversational behaviors. Recent progress has shown that these behaviors can be learned from datasets of human-human conversations. However, recent data-driven methods tend to overfit to the human speakers that are seen in training data and fail to generalize well to previously unseen speakers. In this paper, we explore the use of data augmentation for effective nodding behavior in a robot. We show that, by augmenting the input speech and visual features, we can produce data-driven models that are more robust to unseen features without collecting additional data. We analyze the efficacy of data-driven backchanneling in a realistic human-robot conversational setting with a user study, showing that users perceived the data-driven model to be better at listening as compared to rule-based and random baselines. 
    more » « less
  3. null (Ed.)
    Accessible onscreen keyboards require people who are blind to keep out their phone at all times to search for visual affordances they cannot see. Is it possible to re-imagine text entry without a reference screen? To explore this question, we introduce screenless keyboards as aural flows (keyflows): rapid auditory streams of Text-To-Speech (TTS) characters controllable by hand gestures. In a study, 20 screen-reader users experienced keyflows to perform initial text entry. Typing took inordinately longer than current screen-based keyboards, but most participants preferred screen-free text entry to current methods, especially for short messages on-the-go. We model navigation strategies that participants enacted to aurally browse entirely auditory keyboards and discuss their limitation and benefits for daily access. Our work points to trade-offs in user performance and user experience for situations when blind users may trade typing speed with the benefit of being untethered from the screen. 
    more » « less
  4. Online data visualizations play an important role in informing public opinion but are often inaccessible to screen reader users. To address the need for accessible data representations on the web that provide direct, multimodal, and up-to-date access to the data, we investigate audio data narratives –which combine textual descriptions and sonification (the mapping of data to non-speech sounds). We conduct two co-design workshops with screen reader users to define design principles that guide the structure, content, and duration of a data narrative. Based on these principles and relevant auditory processing characteristics, we propose a dynamic programming approach to automatically generate an audio data narrative from a given dataset. We evaluate our approach with 16 screen reader users. Findings show with audio narratives, users gain significantly more insights from the data. Users describe data narratives help them better extract and comprehend the information in both the sonification and description. 
    more » « less
  5. Purpose: The goal of this study was to assess the listening behavior and social engagement of cochlear implant (CI) users and normal-hearing (NH) adults in daily life and relate these actions to objective hearing outcomes. Method: Ecological momentary assessments (EMAs) collected using a smartphone app were used to probe patterns of listening behavior in CI users and age-matched NH adults to detect differences in social engagement and listening behavior in daily life. Participants completed very short surveys every 2 hr to provide snapshots of typical, everyday listening and socializing, as well as longer, reflective surveys at the end of the day to assess listening strategies and coping behavior. Speech perception testing, with accompanying ratings of task difficulty, was also performed in a lab setting to uncover possible correlations between objective and subjective listening behavior. Results: Comparisons between speech intelligibility testing and EMA responses showed poorer performing CI users spending more time at home and less time conversing with others than higher performing CI users and their NH peers. Perception of listening difficulty was also very different for CI users and NH listeners, with CI users reporting little difficulty despite poor speech perception performance. However, both CI users and NH listeners spent most of their time in listening environments they considered “not difficult.” CI users also reported using several compensatory listening strategies, such as visual cues, whereas NH listeners did not. Conclusion: Overall, the data indicate systematic differences between how individual CI users and NH adults navigate and manipulate listening and social environments in everyday life. 
    more » « less