Caption text conveys salient auditory information to deaf or hard-of-hearing (DHH) viewers. However, the emotional information within the speech is not captured. We developed three emotive captioning schemas that map the output of audio-based emotion detection models to expressive caption text that can convey underlying emotions. The three schemas used typographic changes to the text, color changes, or both. Next, we designed a Unity framework to implement these schemas and used it to generate stimuli videos. In an experimental evaluation with 28 DHH viewers, we compared DHH viewers’ ability to understand emotions and their subjective judgments across the three captioning schemas. We found no significant difference in participants’ ability to understand the emotion based on the captions or their subjective preference ratings. Open-ended feedback revealed factors contributing to individual differences in preferences among the participants and challenges with automatically generated emotive captions that motivate future work.
more »
« less
Who is speaking: Unpacking In-text Speaker Identification Preference of Viewers who are Deaf and Hard of Hearing while Watching Live Captioned Television Program
Live TV news and interviews often include multiple individuals speaking, with rapid turn-taking, which makes it difficult for viewers who are Deaf and Hard of Hearing (DHH) to follow who is speaking when reading captions. Prior research has proposed several methods of indicating who is speaking. While recent studies
have observed various preferences among DHHviewers for speaker identification methods for videos with different numbers of speakers onscreen, there has not yet been a study that has systematically explored whether there is a formal relationship between the number of people onscreen and the preferences among DHH viewers for how to indicate the speaker in captions.We conducted an empirical study followed by a semi-structured interview with 17 DHH participants to record their preferences among various speaker-identifier
types for videos that vary in the number of speakers onscreen. We observed an interaction effect between DHH viewers’ preference for speaker identification and the number of speakers in a video. An analysis of open-ended feedback from participants revealed several factors that influenced their preferences. Our findings guide broadcasters and captioners in selecting speaker-identification methods for captioned videos.
more »
« less
- Award ID(s):
- 2150429
- PAR ID:
- 10486732
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- W4A '23: Proceedings of the 20th International Web for All Conference
- ISBN:
- 9798400707483
- Page Range / eLocation ID:
- 44 to 53
- Format(s):
- Medium: X
- Location:
- Austin TX USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
People who are Deaf or Hard of Hearing (DHH) benefit from text captioning to understand audio, yet captions alone are often insufficient for the complex environment of a panel presentation, with rapid and unpredictable turn-taking among multiple speakers. It is challenging and tiring for DHH individuals to view captioned panel presentations, leading to feelings of misunderstanding and exclusion. In this work, we investigate the potential of Mixed Reality (MR) head-mounted displays for providing captioning with visual cues to indicate which person on the panel is speaking. For consistency in our experimental study, we simulate a panel presentation in virtual reality (VR) with various types of MR visual cues; in a study with 18 DHH participants, visual cues made it easier to identify speakers.more » « less
-
Recent research has investigated automatic methods for identifying how important each word in a text is for the overall message, in the context of people who are Deaf and Hard of Hearing (DHH) viewing video with captions. We examine whether DHH users report benefits from visual highlighting of important words in video captions. In formative interview and prototype studies, users indicated a preference for underlining of 5%-15% of words in a caption text to indicate that they are important, and they expressed an interest for such text markup in the context of educational lecture videos. In a subsequent user study, 30 DHH participants viewed lecture videos in two forms: with and without such visual markup. Users indicated that the videos with captions containing highlighted words were easier to read and follow, with lower perceived task-load ratings, compared to the videos without highlighting. This study motivates future research on caption highlighting in online educational videos, and it provides a foundation for how to evaluate the efficacy of such systems with users.more » « less
-
Antona, M ; Stephanidis, C (Ed.)Environmental sounds can provide important information about surrounding activity, yet recognizing sounds can be challenging for Deaf and Hard-of-Hearing (DHH) individuals. Prior work has examined the preferences of DHH users for various sound-awareness methods. However, these preferences have been observed to vary along some demographic factors. Thus, in this study we investigate the preferences of a specific group of DHH users: current assistive listening devices users. Through a survey of 38 participants, we investigated their challenges and requirements for sound-awareness applications, as well as which type of sounds and what aspects of the sounds are of importance to them. We found that users of assistive listening devices still often miss sounds and rely on other people to obtain information about them. Participants indicated that the importance of awareness of different types of sounds varied according to the environment and the form factor of the sound-awareness technology. Congruent with prior work, participants reported that the location and urgency of the sound were of importance, as well as the confidence of the technology in its identification of that sound.more » « less
-
Various technologies mediate synchronous audio-visual one-on-one communication (SAVOC) between Deaf and Hard-of-Hearing (DHH) and hearing colleagues, including automatic-captioning smartphone apps for in-person settings, or text-chat features of videoconferencing software in remote settings. Speech and non-verbal behaviors of hearing speakers, e.g. speaking too quietly, can make SAVOC difficult for DHH users, but prior work had not examined technology-mediated contexts. In an in-person study (N=20) with an automatic captioning smartphone app, variations in a hearing actor's enunciation and intonation dynamics affected DHH users' satisfaction. In a remote study (N=23) using a videoconferencing platform with text chat, variations in speech rate, voice intensity, enunciation, intonation dynamics, and eye contact led to such differences. This work contributes empirical evidence that specific behaviors of hearing speakers affect the accessibility of technology-mediated SAVOC for DHH users, providing motivation for future work on detecting or encouraging useful communication behaviors among hearing individuals.more » « less