skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Implementing ASLNet V1.0: Progress and Plans
We report on the development of ASLNet, a wordnet for American Sign Language (ASL). ASLNet V1.0 is currently under construction by mapping easy-to-translate ASL lexical nouns to Princeton WordNet synsets. We describe our data model and mapping approach, which can be extended to any sign language. Analysis of the 390 synsets processed to date indicates the success of our procedure yet also highlights the need to supplement our mapping with the “merge” method. We outline our plans for upcoming work to remedy this, which include use of ASL free-association data.  more » « less
Award ID(s):
1918252
PAR ID:
10290195
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the 11th Global Wordnet Conference
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Arabnia, Hamid; Deligiannidis, Leonidas; Tinetti, Fernando; Tran, Quoc-Nam (Ed.)
    Millions of people with hearing disabilities use sign language for communication, creating a communication gap with those who are not fluent in ASL (American Sign Language). This paper aims to introduce an ASL interpreter system using a smart-glasses-based augmented reality system. We begin by introducing and comparing two models that translate spoken language into ASL poses. The first system translates spoken text to ASL Gloss, an intermediate representation, before generating ASL poses. The second system directly translates the text to ASL poses. Our analysis shows that using ASL Gloss as an intermediate step significantly improves the translation speed. We then explore a system of encoding ASL pose videos for display on smart glasses. The chosen translation method has a BLEU score of 66.5801 and a rate of 1.825 milliseconds per gloss translation. Our algorithm for mapping gloss text to ASL videos obtained a mean squared error of 0.05, indicating that our system has good translational accuracy and a low mapping error. 
    more » « less
  2. Sign language recognition and translation technologies have the potential to increase access and inclusion of deaf signing communities, but research progress is bottlenecked by a lack of representative data. We introduce a new resource for American Sign Language (ASL) modeling, the Sem-Lex Benchmark. The Benchmark is the current largest of its kind, consisting of over 84k videos of isolated sign productions from deaf ASL signers who gave informed consent and received compensation. Human experts aligned these videos with other sign language resources including ASL-LEX, SignBank, and ASL Citizen, enabling useful expansions for sign and phonological feature recognition. We present a suite of experiments which make use of the linguistic information in ASL-LEX, evaluating the practicality and fairness of the Sem-Lex Benchmark for isolated sign recognition (ISR). We use an SL-GCN model to show that the phonological features are recognizable with 85% accuracy, and that they are effective as an auxiliary target to ISR. Learning to recognize phonological features alongside gloss results in a 6% improvement for few-shot ISR accuracy and a 2% improvement for ISR accuracy overall. Instructions for downloading the data can be found at https://github.com/leekezar/SemLex. 
    more » « less
  3. Sign words are the building blocks of any sign language. In this work, we present wSignGen, a word-conditioned 3D American Sign Language (ASL) generation model dedicated to synthesizing realistic and grammatically accurate motion sequences for sign words. Our approach leverages a transformer-based diffusion model, trained on a curated dataset of 3D motion meshes from word-level ASL videos. By integrating CLIP, wSignGen offers two advantages: image-based generation, which is particularly useful for children learning sign language but not yet able to read, and the ability to generalize to unseen synonyms. Experiments demonstrate that wSignGen significantly outperforms the baseline model in the task of sign word generation. Moreover, human evaluation experiments show that wSignGen can generate high-quality, grammatically correct ASL signs effectively conveyed through 3D avatars. 
    more » « less
  4. Sign language is a complex visual language, and automatic interpretations of sign language can facilitate communication involving deaf individuals. As one of the essential components of sign language, fingerspelling connects the natural spoken languages to the sign language and expands the scale of sign language vocabulary. In practice, it is challenging to analyze fingerspelling alphabets due to their signing speed and small motion range. The usage of synthetic data has the potential of further improving fingerspelling alphabets analysis at scale. In this paper, we evaluate how different video-based human representations perform in a framework for Alphabet Generation for American Sign Language (ASL). We tested three mainstream video-based human representations: twostream inflated 3D ConvNet, 3D landmarks of body joints, and rotation matrices of body joints. We also evaluated the effect of different skeleton graphs and selected body joints. The generation process of ASL fingerspelling used a transformerbased Conditional Variational Autoencoder. To train the model, we collected ASL alphabet signing videos from 17 signers with dynamic alphabet signing. The generated alphabets were evaluated using automatic metrics of quality such as FID, and we also considered supervised metrics by recognizing the generated entries using Spatio-Temporal Graph Convolutional Networks. Our experiments show that using the rotation matrices of the upper body joints and the signing hand give the best results for the generation of ASL alphabet signing. Going forward, our goal is to produce articulated fingerspelling words by combining individual alphabets learned in this work. 
    more » « less
  5. Abstract Over the past decade, there have been great advancements in radio frequency sensor technology for human–computer interaction applications, such as gesture recognition, and human activity recognition more broadly. While there is a significant amount of study on these topics, in most cases, experimental data are acquired in controlled settings by directing participants what motion to articulate. However, especially for communicative motions, such as sign language, such directed data sets do not accurately capture natural, in situ articulations. This results in a difference in the distribution of directed American Sign Language (ASL) versus natural ASL, which severely degrades natural sign language recognition in real‐world scenarios. To overcome these challenges and acquire more representative data for training deep models, the authors develop an interactive gaming environment, ChessSIGN, which records video and radar data of participants as they play the gamewithout any external direction. The authors investigate various ways of generating synthetic samples from directed ASL data, but show that ultimately such data does not offer much improvement over just initialising using imagery from ImageNet. In contrast, an interactive learning paradigm is proposed by the authors in which model training is shown to improve as more and more natural ASL samples are acquired and augmented via synthetic samples generated from a physics‐aware generative adversarial network. The authors show that the proposed approach enables the recognition of natural ASL in a real‐world setting, achieving an accuracy of 69% for 29 ASL signs—a 60% improvement over conventional training with directed ASL data. 
    more » « less