Captions play a major role in making educational videos accessible to all and are known to benefit a wide range of learners. However, many educational videos either do not have captions or have inaccurate captions. Prior work has shown the benefits of using crowdsourcing to obtain accurate captions in a cost-efficient way, though there is a lack of understanding of how learners edit captions of educational videos either individually or collaboratively. In this work, we conducted a user study where 58 learners (in a course of 387 learners) participated in the editing of captions in 89 lecture videos that were generated by Automatic Speech Recognition (ASR) technologies. For each video, different learners conducted two rounds of editing. Based on editing logs, we created a taxonomy of errors in educational video captions (e.g., Discipline-Specific, General, Equations). From the interviews, we identified individual and collaborative error editing strategies. We then further demonstrated the feasibility of applying machine learning models to assist learners in editing. Our work provides practical implications for advancing video-based learning and for educational video caption editing.
more »
« less
Evaluating the Benefit of Highlighting Key Words in Captions for People who are Deaf or Hard of Hearing
Recent research has investigated automatic methods for identifying how important each word in a text is for the overall message, in the context of people who are Deaf and Hard of Hearing (DHH) viewing video with captions. We examine whether DHH users report benefits from visual highlighting of important words in video captions. In formative interview and prototype studies, users indicated a preference for underlining of 5%-15% of words in a caption text to indicate that they are important, and they expressed an interest for such text markup in the context of educational lecture videos. In a subsequent user study, 30 DHH participants viewed lecture videos in two forms: with and without such visual markup. Users indicated that the videos with captions containing highlighted words were easier to read and follow, with lower perceived task-load ratings, compared to the videos without highlighting. This study motivates future research on caption highlighting in online educational videos, and it provides a foundation for how to evaluate the efficacy of such systems with users.
more »
« less
- PAR ID:
- 10170034
- Date Published:
- Journal Name:
- The 21st International ACM SIGACCESS Conference on Computers and Accessibility
- Page Range / eLocation ID:
- 43 to 55
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Caption text conveys salient auditory information to deaf or hard-of-hearing (DHH) viewers. However, the emotional information within the speech is not captured. We developed three emotive captioning schemas that map the output of audio-based emotion detection models to expressive caption text that can convey underlying emotions. The three schemas used typographic changes to the text, color changes, or both. Next, we designed a Unity framework to implement these schemas and used it to generate stimuli videos. In an experimental evaluation with 28 DHH viewers, we compared DHH viewers’ ability to understand emotions and their subjective judgments across the three captioning schemas. We found no significant difference in participants’ ability to understand the emotion based on the captions or their subjective preference ratings. Open-ended feedback revealed factors contributing to individual differences in preferences among the participants and challenges with automatically generated emotive captions that motivate future work.more » « less
-
null (Ed.)Deaf and hard of hearing (DHH) viewers watch multimedia with captions on devices with widely varying widths. We investigated the impact of caption width on viewers' preferences. Previous research has shown that presenting one word lines allows viewers to read much more quickly than traditional reading, while others have shown that the optimal width for captions is 6 words per line. Our study showed that DHH viewers had no preference difference between 6 and 12 word lines. Furthermore, they significantly preferred 6 and 12 word lines over single word lines due to the need to split attention between the captions and video.more » « less
-
Few VR applications and games implement captioning of speech and audio cues, which either inhibits or prevents access of their application by deaf or hard of hearing (DHH) users, new language learners, and other caption users. Additionally, little to no guidelines exist on how to implement live captioning on VR headsets and how it may differ from traditional television captioning. To help fill the void of information behind user preferences of different VR captioning styles, we conducted a study with eight DHH participants to test three caption movement behaviors (head-locked, lag, and appear- locked) while watching live-captioned, single-speaker presentations in VR. Participants answered a series of Likert scale and open-ended questions about their experience. Participants’ preferences were split, but most participants reported feeling comfortable with using live captions in VR and enjoyed the experience. When participants ranked the caption behaviors, there was almost an equal divide between the three types tested. IPQ results indicated each behavior had similar immersion ratings, however participants found head-locked and lag captions more user-friendly than appear-locked captions. We suggest that participants may vary in caption preference depending on how they use captions, and that providing opportunities for caption customization is best.more » « less
-
Few VR applications and games implement captioning of speech and audio cues, which either inhibits or prevents access of their application by deaf or hard of hearing (DHH) users, new language learners, and other caption users. Additionally, little to no guidelines exist on how to implement live captioning on VR headsets and how it may differ from traditional television captioning. To help fill the void of information behind user preferences of different VR captioning styles, we conducted a study with eight DHH participants to test three caption movement behaviors (head-locked, lag, and appearlocked) while watching live-captioned, single-speaker presentations in VR. Participants answered a series of Likert scale and open-ended questions about their experience. Participants’ preferences were split, but most participants reported feeling comfortable with using live captions in VR and enjoyed the experience. When participants ranked the caption behaviors, there was almost an equal divide between the three types tested. IPQ results indicated each behavior had similar immersion ratings, however participants found head-locked and lag captions more user-friendly than appear-locked captions. We suggest that participants may vary in caption preference depending on how they use captions, and that providing opportunities for caption customization is bestmore » « less