Caption text conveys salient auditory information to deaf or hard-of-hearing (DHH) viewers. However, the emotional information within the speech is not captured. We developed three emotive captioning schemas that map the output of audio-based emotion detection models to expressive caption text that can convey underlying emotions. The three schemas used typographic changes to the text, color changes, or both. Next, we designed a Unity framework to implement these schemas and used it to generate stimuli videos. In an experimental evaluation with 28 DHH viewers, we compared DHH viewers’ ability to understand emotions and their subjective judgments across the three captioning schemas. We found no significant difference in participants’ ability to understand the emotion based on the captions or their subjective preference ratings. Open-ended feedback revealed factors contributing to individual differences in preferences among the participants and challenges with automatically generated emotive captions that motivate future work. 
                        more » 
                        « less   
                    This content will become publicly available on May 2, 2026
                            
                            Inclusive Emotion Technologies: Addressing the Needs of d/Deaf and Hard of Hearing Learners in Video-Based Learning
                        
                    
    
            Accessibility efforts for d/Deaf and hard of hearing (DHH) learners in video-based learning have mainly focused on captions and interpreters, with limited attention to learners' emotional awareness--an important yet challenging skill for effective learning. Current emotion technologies are designed to support learners' emotional awareness and social needs; however, little is known about whether and how DHH learners could benefit from these technologies. Our study explores how DHH learners perceive and use emotion data from two collection approaches, self-reported and automatic emotion recognition (AER), in video-based learning. By comparing the use of these technologies between DHH (N=20) and hearing learners (N=20), we identified key differences in their usage and perceptions: 1) DHH learners enhanced their emotional awareness by rewatching the video to self-report their emotions and called for alternative methods for self-reporting emotion, such as using sign language or expressive emoji designs; and 2) while the AER technology could be useful for detecting emotional patterns in learning experiences, DHH learners expressed more concerns about the accuracy and intrusiveness of the AER data. Our findings provide novel design implications for improving the inclusiveness of emotion technologies to support DHH learners, such as leveraging DHH peer learners' emotions to elicit reflections. 
        more » 
        « less   
        
    
    
                            - PAR ID:
- 10635774
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Proceedings of the ACM on Human-Computer Interaction
- Volume:
- 9
- Issue:
- 2
- ISSN:
- 2573-0142
- Page Range / eLocation ID:
- 1 to 27
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Affective captions employ visual typographic modulations to convey a speaker’s emotions, improving speech accessibility for Deaf and Hard-of-Hearing (dhh) individuals. However, the most effective visual modulations for expressing emotions remain uncertain. Bridging this gap, we ran three studies with 39 dhh participants, exploring the design space of affective captions, which include parameters like text color, boldness, size, and so on. Study 1 assessed preferences for nine of these styles, each conveying either valence or arousal separately. Study 2 combined Study 1’s top-performing styles and measured preferences for captions depicting both valence and arousal simultaneously. Participants outlined readability, minimal distraction, intuitiveness, and emotional clarity as key factors behind their choices. In Study 3, these factors and an emotion-recognition task were used to compare how Study 2’s winning styles performed versus a non-styled baseline. Based on our findings, we present the two best-performing styles as design recommendations for applications employing affective captions.more » « less
- 
            This study investigates innovative interaction designs for communication and collaborative learning between learners of mixed hearing and signing abilities, leveraging advancements in mixed reality technologies like Apple Vision Pro and generative AI for animated avatars. Adopting a participatory design approach, we engaged 15 d/Deaf and hard of hearing (DHH) students to brainstorm ideas for an AI avatar with interpreting ability (sign language to English and English to sign language) that would facilitate their face-to-face communication with hearing peers. Participants envisioned the AI avatars to address some issues with human interpreters, such as lack of availability, and provide affordable options to expensive personalized interpreting services. Our findings indicate a range of preferences for integrating the AI avatars with actual human figures of both DHH and hearing communication partners. The participants highlighted the importance of having control over customizing the AI avatar, such as AI-generated signs, voices, facial expressions, and their synchronization for enhanced emotional display in communication. Based on our findings, we propose a suite of design recommendations that balance respecting sign language norms with adherence to hearing social norms. Our study offers insights into improving the authenticity of generative AI in scenarios involving specific and sometimes unfamiliar social norms.more » « less
- 
            Learners' awareness of their own affective states (emotions) can improve their meta-cognition, which is a critical skill of being aware of and controlling one's cognitive, motivational, and affect, and adjusting their learning strategies and behaviors accordingly. To investigate the effect of peers' affects on learners' meta-cognition, we proposed two types of cues that aggregated peers' affects that were recognized via facial expression recognition:Locative cues (displaying the spikes of peers' emotions along a video timeline) andTemporal cues (showing the positivities of peers' emotions at different segments of a video). We conducted a between-subject experiment with 42 college students through the use of think-aloud protocols, interviews, and surveys. Our results showed that the two types of cues improved participants' meta-cognition differently. For example, interacting with theTemporal cues triggered the participants to compare their own affective responses with their peers and reflect more on why and how they had different emotions with the same video content. While the participants perceived the benefits of using AI-generated peers' cues to improve their awareness of their own learning affects, they also sought more explanations from their peers to understand the AI-generated results. Our findings not only provide novel design implications for promoting learners' meta-cognition with privacy-preserved social cues of peers' learning affects, but also suggest an expanded design framework for Explainable AI (XAI).more » « less
- 
            Despite significant vision loss, humans can still recognize various emotional stimuli via a sense of hearing and express diverse emotional responses, which can be sorted into two dimensions, arousal and valence. Yet, many research studies have been focusing on sighted people, leading to lack of knowledge about emotion perception mechanisms of people with visual impairment. This study aims at advancing knowledge of the degree to which people with visual impairment perceive various emotions – high/low arousal and positive/negative emotions. A total of 30 individuals with visual impairment participated in interviews where they listened to stories of people who became visually impaired, encountered and overcame various challenges, and they were instructed to share their emotions. Participants perceived different kinds and intensities of emotions, depending on their demographic variables such as living alone, loneliness, onset of visual impairment, visual acuity, race/ethnicity, and employment status. The advanced knowledge of emotion perceptions in people with visual impairment is anticipated to contribute toward better designing social supports that can adequately accommodate those with visual impairment.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
