skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Word-Conditioned 3D American Sign Language Motion Generation
Sign words are the building blocks of any sign language. In this work, we present wSignGen, a word-conditioned 3D American Sign Language (ASL) generation model dedicated to synthesizing realistic and grammatically accurate motion sequences for sign words. Our approach leverages a transformer-based diffusion model, trained on a curated dataset of 3D motion meshes from word-level ASL videos. By integrating CLIP, wSignGen offers two advantages: image-based generation, which is particularly useful for children learning sign language but not yet able to read, and the ability to generalize to unseen synonyms. Experiments demonstrate that wSignGen significantly outperforms the baseline model in the task of sign word generation. Moreover, human evaluation experiments show that wSignGen can generate high-quality, grammatically correct ASL signs effectively conveyed through 3D avatars.  more » « less
Award ID(s):
2223507 2229873
PAR ID:
10569926
Author(s) / Creator(s):
; ;
Publisher / Repository:
Association for Computational Linguistics
Date Published:
Page Range / eLocation ID:
9993 to 9999
Format(s):
Medium: X
Location:
Miami, Florida, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Sign language is a complex visual language, and automatic interpretations of sign language can facilitate communication involving deaf individuals. As one of the essential components of sign language, fingerspelling connects the natural spoken languages to the sign language and expands the scale of sign language vocabulary. In practice, it is challenging to analyze fingerspelling alphabets due to their signing speed and small motion range. The usage of synthetic data has the potential of further improving fingerspelling alphabets analysis at scale. In this paper, we evaluate how different video-based human representations perform in a framework for Alphabet Generation for American Sign Language (ASL). We tested three mainstream video-based human representations: twostream inflated 3D ConvNet, 3D landmarks of body joints, and rotation matrices of body joints. We also evaluated the effect of different skeleton graphs and selected body joints. The generation process of ASL fingerspelling used a transformerbased Conditional Variational Autoencoder. To train the model, we collected ASL alphabet signing videos from 17 signers with dynamic alphabet signing. The generated alphabets were evaluated using automatic metrics of quality such as FID, and we also considered supervised metrics by recognizing the generated entries using Spatio-Temporal Graph Convolutional Networks. Our experiments show that using the rotation matrices of the upper body joints and the signing hand give the best results for the generation of ASL alphabet signing. Going forward, our goal is to produce articulated fingerspelling words by combining individual alphabets learned in this work. 
    more » « less
  2. Achieving expressive 3D motion reconstruction and automatic generation for isolated sign words can be challenging, due to the lack of real-world 3D sign-word data, the complex nuances of signing motions, and the cross-modal understanding of sign language semantics. To address these challenges, we introduce SignAvatar, a framework capable of both word-level sign language reconstruction and generation. SignAvatar employs a transformer-based conditional variational autoencoder architecture, effectively establishing relationships across different semantic modalities. Additionally, this approach incorporates a curriculum learning strategy to enhance the model's robustness and generalization, resulting in more realistic motions. Furthermore, we contribute the ASL3DWord dataset, composed of 3D joint rotation data for the body, hands, and face, for unique sign words. We demonstrate the effectiveness of SignAvatar through extensive experiments, showcasing its superior reconstruction and automatic generation capabilities. The code and dataset are available on the project page 
    more » « less
  3. null (Ed.)
    Over the years, there has been much research in both wearable and video-based American Sign Language (ASL) recognition systems. However, the restrictive and invasive nature of these sensing modalities remains a significant disadvantage in the context of Deaf-centric smart environments or devices that are responsive to ASL. This paper investigates the efficacy of RF sensors for word-level ASL recognition in support of human-computer interfaces designed for deaf or hard-of-hearing individuals. A principal challenge is the training of deep neural networks given the difficulty in acquiring native ASL signing data. In this paper, adversarial domain adaptation is exploited to bridge the physical/kinematic differences between the copysigning of hearing individuals (repetition of sign motion after viewing a video), and native signing of Deaf individuals who are fluent in sign language. Domain adaptation results are compared with those attained by directly synthesizing ASL signs using generative adversarial networks (GANs). Kinematic improvements to the GAN architecture, such as the insertion of micro-Doppler signature envelopes in a secondary branch of the GAN, are utilized to boost performance. Word-level classification accuracy of 91.3% is achieved for 20 ASL words. 
    more » « less
  4. Despite some prior research and commercial systems, if someone sees an unfamiliar American Sign Language (ASL) word and wishes to look up its meaning in a dictionary, this remains a difficult task. There is no standard label a user can type to search for a sign, and formulating a query based on linguistic properties is challenging for students learning ASL. Advances in sign-language recognition technology will soon enable the design of a search system for ASL word look-up in dictionaries, by allowing users to generate a query by submitting a video of themselves performing the word they believe they encountered somewhere. Users would then view a results list of video clips or animations, to seek the desired word. In this research, we are investigating the usability of such a proposed system, a webcam-based ASL dictionary system, using a Wizard-of-Oz prototype and enhanced the design so that it can support sign language word look-up even when the performance of the underlying sign-recognition technology is low. We have also investigated the requirements of students learning ASL in regard to how results should be displayed and how a system could enable them to filter the results of the initial query, to aid in their search for a desired word. We compared users’ satisfaction when using a system with or without post-query filtering capabilities. We discuss our upcoming study to investigate users’ experience with a working prototype based on actual sign-recognition technology that is being designed. Finally, we discuss extensions of this work to the context of users searching datasets of videos of other human movements, e.g. dance moves, or when searching for words in other languages. 
    more » « less
  5. Searching unfamiliar American Sign Language (ASL) words in a dictionary is challenging for learners, as it involves recalling signs from memory and providing specific linguistic details. Fortunately, the emergence of sign-recognition technology will soon enable users to search by submitting a video of themselves performing the word. Although previous research has independently addressed algorithmic enhancements and design aspects of ASL dictionaries, there has been limited effort to integrate both. This paper presents the design of an end-to-end sign language dictionary system, incorporating design recommendations from recent human–computer interaction (HCI) research. Additionally, we share preliminary findings from an interview-based user study with four ASL learners. 
    more » « less