NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CuCap: Comparative Analysis of Customized Captioning between North American and South Korean d/Deaf and Hard-of-Hearing Users

https://doi.org/10.1145/3663547.3746400

de_Lacerda_Pataca, Caluã; Ahn, SooYeon; Yoo, Suhyeon; Kim, JooYeong; Truong, Khai N; Hong, Jin-Hyuk; Peiris, Roshan L; Huenerfauth, Matt (October 2025, ACM)

Full Text Available
Tactile Emotions: Multimodal Affective Captioning with Haptics Improves Narrative Engagement for d/Deaf and Hard-of-Hearing Viewers

https://doi.org/10.1145/3706598.3713304

de_Lacerda_Pataca, Caluã; Hassan, Saad; May, Lloyd; Olson, Michelle M; D'aurio, Toni; Peiris, Roshan L; Huenerfauth, Matt (April 2025, ACM)

This paper explores a multimodal approach for translating emotional cues present in speech, designed with Deaf and Hard-of-Hearing (dhh) individuals in mind. Prior work has focused on visual cues applied to captions, successfully conveying whether a speaker’s words have a negative or positive tone (valence), but with mixed results regarding the intensity (arousal) of these emotions. We propose a novel method using haptic feedback to communicate a speaker’s arousal levels through vibrations on a wrist-worn device. In a formative study with 16 dhh participants, we tested six haptic patterns and found that participants preferred single per-word vibrations at 75 Hz to encode arousal. In a follow-up study with 27 dhh participants, this pattern was paired with visual cues, and narrative engagement with audio-visual content was measured. Results indicate that combining haptics with visuals significantly increased engagement compared to a conventional captioning baseline and a visuals-only affective captioning style.
more » « less
Full Text Available
Diffusion Models for Sign Language Video Anonymization

Xia, Zhaoyang; Zhou, Yang; Han, Ligong; Neidle, Carol; Metaxas, Dimitris (May 2024, Proceedings of the {LREC-COLING} 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources)
Efthimiou, Eleni; Fotinea, Stavroula-Evita; Hanke, Thomas; Hochgesang, Julie A; Mesch, Johanna; Schulder, Marc (Ed.)
Since American Sign Language (ASL) has no standard written form, Deaf signers frequently share videos in order to communicate in their native language. However, this does not preserve privacy. Since critical linguistic information is transmitted through facial expressions, the face cannot be obscured. While signers have expressed interest, for a variety of applications, in sign language video anonymization that would effectively preserve linguistic content, attempts to develop such technology have had limited success and generally require pose estimation that cannot be readily carried out in the wild. To address current limitations, our research introduces DiffSLVA, a novel methodology that uses pre-trained large-scale diffusion models for text-guided sign language video anonymization. We incorporate ControlNet, which leverages low-level image features such as HED (Holistically-Nested Edge Detection) edges, to circumvent the need for pose estimation. Additionally, we develop a specialized module to capture linguistically essential facial expressions. We then combine the above methods to achieve anonymization that preserves the essential linguistic content of the original signer. This innovative methodology makes possible, for the first time, sign language video anonymization that could be used for real-world applications, which would offer significant benefits to the Deaf and Hard-of-Hearing communities.
more » « less
Full Text Available
A Multimodal Spatio-Temporal GCN Model with Enhancements for Isolated Sign Recognition

Zhou, Yang; Xia, Zhaoyang; Chen, Yuxiao; Neidle, Carol; Metaxas, Dimitris (May 2024, Proceedings of the LREC-COLING 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources)
Efthimiou, Eleni; Fotinea, Stavroula-Evita; Hanke, Thomas; Hochgesang, Julie A; Mesch, Johanna; Schulder, Marc (Ed.)
We propose a multimodal network using skeletons and handshapes as input to recognize individual signs and detect their boundaries in American Sign Language (ASL) videos. Our method integrates a spatio-temporal Graph Convolutional Network (GCN) architecture to estimate human skeleton keypoints; it uses a late-fusion approach for both forward and backward processing of video streams. Our (core) method is designed for the extraction---and analysis of features from---ASL videos, to enhance accuracy and efficiency of recognition of individual signs. A Gating module based on per-channel multi-layer convolutions is employed to evaluate significant frames for recognition of isolated signs. Additionally, an auxiliary multimodal branch network, integrated with a transformer, is designed to estimate the linguistic start and end frames of an isolated sign within a video clip. We evaluated performance of our approach on multiple datasets that include isolated, citation-form signs and signs pre-segmented from continuous signing based on linguistic annotations of start and end points of signs within sentences. We have achieved very promising results when using both types of sign videos combined for training, with overall sign recognition accuracy of 80.8% Top-1 and 95.2% Top-5 for citation-form signs, and 80.4% Top-1 and 93.0% Top-5 for signs pre-segmented from continuous signing.
more » « less
Full Text Available
Caption Royale: Exploring the Design Space of Affective Captions from the Perspective of Deaf and Hard-of-Hearing Individuals

https://doi.org/10.1145/3613904.3642258

de_Lacerda_Pataca, Caluã; Hassan, Saad; Tinker, Nathan; Peiris, Roshan Lalintha; Huenerfauth, Matt (May 2024, ACM)

Affective captions employ visual typographic modulations to convey a speaker’s emotions, improving speech accessibility for Deaf and Hard-of-Hearing (dhh) individuals. However, the most effective visual modulations for expressing emotions remain uncertain. Bridging this gap, we ran three studies with 39 dhh participants, exploring the design space of affective captions, which include parameters like text color, boldness, size, and so on. Study 1 assessed preferences for nine of these styles, each conveying either valence or arousal separately. Study 2 combined Study 1’s top-performing styles and measured preferences for captions depicting both valence and arousal simultaneously. Participants outlined readability, minimal distraction, intuitiveness, and emotional clarity as key factors behind their choices. In Study 3, these factors and an emotion-recognition task were used to compare how Study 2’s winning styles performed versus a non-styled baseline. Based on our findings, we present the two best-performing styles as design recommendations for applications employing affective captions.
more » « less
Full Text Available
Designing and Evaluating an Advanced Dance Video Comprehension Tool with In-situ Move Identification Capabilities

https://doi.org/10.1145/3613904.3642710

Hassan, Saad; De_Lacerda_Pataca, Caluã; Nourian, Laleh; Tigwell, Garreth W; Davis, Briana; Silver_Wagman, Will Zhenya (May 2024, ACM)

Analyzing dance moves and routines is a foundational step in learning dance. Videos are often utilized at this step, and advancements in machine learning, particularly in human-movement recognition, could further assist dance learners. We developed and evaluated a Wizard-of-Oz prototype of a video comprehension tool that offers automatic in-situ dance move identification functionality. Our system design was informed by an interview study involving 12 dancers to understand the challenges they face when trying to comprehend complex dance videos and taking notes. Subsequently, we conducted a within-subject study with 8 Cuban salsa dancers to identify the benefits of our system compared to an existing traditional feature-based search system. We found that the quality of notes taken by participants improved when using our tool, and they reported a lower workload. Based on participants’ interactions with our system, we offer recommendations on how an AI-powered span-search feature can enhance dance video comprehension tools.
more » « less
Full Text Available
DiffSLVA: Harnessing Diffusion Models for Sign Language Video Anonymization

Xia, Zhaoyang; Neidle, Carol; Metaxas, Dimitris N. (November 2023, arXiv)

Since American Sign Language (ASL) has no standard written form, Deaf signers frequently share videos in order to communicate in their native language. However, since both hands and face convey critical linguistic information in signed languages, sign language videos cannot preserve signer privacy. While signers have expressed interest, for a variety of applications, in sign language video anonymization that would effectively preserve linguistic content, attempts to develop such technology have had limited success, given the complexity of hand movements and facial expressions. Existing approaches rely predominantly on precise pose estimations of the signer in video footage and often require sign language video datasets for training. These requirements prevent them from processing videos 'in the wild,' in part because of the limited diversity present in current sign language video datasets. To address these limitations, our research introduces DiffSLVA, a novel methodology that utilizes pre-trained large-scale diffusion models for zero-shot text-guided sign language video anonymization. We incorporate ControlNet, which leverages low-level image features such as HED (Holistically-Nested Edge Detection) edges, to circumvent the need for pose estimation. Additionally, we develop a specialized module dedicated to capturing facial expressions, which are critical for conveying essential linguistic information in signed languages. We then combine the above methods to achieve anonymization that better preserves the essential linguistic content of the original signer. This innovative methodology makes possible, for the first time, sign language video anonymization that could be used for real-world applications, which would offer significant benefits to the Deaf and Hard-of-Hearing communities. We demonstrate the effectiveness of our approach with a series of signer anonymization experiments.
more » « less
Full Text Available
Challenges for Linguistically-Driven Computer-Based Sign Recognition from Continuous Signing for American Sign Language

Neidle, Carol (November 2023, arXiv.org)

There have been recent advances in computer-based recognition of isolated, citation-form signs from video. There are many challenges for such a task, not least the naturally occurring inter- and intra- signer synchronic variation in sign production, including sociolinguistic variation in the realization of certain signs. However, there are several significant factors that make recognition of signs from continuous signing an even more difficult problem. This article presents an overview of such challenges, based in part on findings from a large corpus of linguistically annotated video data for American Sign Language (ASL). Some linguistic regularities in the structure of signs that can boost handshape and sign recognition are also discussed.
more » « less
Full Text Available
Sign Spotter: Design and Initial Evaluation of an Automatic Video-Based American Sign Language Dictionary System

https://doi.org/10.1145/3597638.3614497

Bohacek, Matyas; Hassan, Saad (October 2023, ACM)

Searching unfamiliar American Sign Language (ASL) words in a dictionary is challenging for learners, as it involves recalling signs from memory and providing specific linguistic details. Fortunately, the emergence of sign-recognition technology will soon enable users to search by submitting a video of themselves performing the word. Although previous research has independently addressed algorithmic enhancements and design aspects of ASL dictionaries, there has been limited effort to integrate both. This paper presents the design of an end-to-end sign language dictionary system, incorporating design recommendations from recent human–computer interaction (HCI) research. Additionally, we share preliminary findings from an interview-based user study with four ASL learners.
more » « less
Full Text Available
Modeling Word Importance in Conversational Transcripts: Toward improved live captioning for Deaf and hard of hearing viewers

https://doi.org/10.1145/3587281.3587290

Amin, Akhter Al; Hassan, Saad; Huenerfauth, Matt; Alm, Cecilia Ovesdotter (April 2023, Proceedings of the 20th International Web for All Conference)

Full Text Available

« Prev Next »

Search for: All records