In this paper, we propose a machine learning-based multi-stream framework to recognize American Sign Language (ASL) manual signs and nonmanual gestures (face and head movements) in real time from RGB-D videos. Our approach is based on 3D Convolutional Neural Networks (3D CNNs) by fusing the multi-modal features including hand gestures, facial expressions, and body poses from multiple channels (RGB, Depth, Motion, and Skeleton joints). To learn the overall temporal dynamics in a video, a proxy video is generated by selecting a subset of frames for each video which are then used to train the proposed 3D CNN model. We collected a new ASL dataset, ASL-100-RGBD, which contains 42 RGB-D videos captured by a Microsoft Kinect V2 camera. Each video consists of 100 ASL manual signs, along with RGB channel, Depth maps, Skeleton joints, Face features, and HD face. The dataset is fully annotated for each semantic region (i.e. the time duration of each sign that the human signer performs). Our proposed method achieves 92.88% accuracy for recognizing 100 ASL sign glosses in our newly collected ASL-100-RGBD dataset. The effectiveness of our framework for recognizing hand gestures from RGB-D videos is further demonstrated on a large-scale dataset, ChaLearn IsoGD, achieving the state-of-the-art results.
more »
« less
A Wireframe-Based Approach for Classifying and Acquiring Proficiency in the American Sign Language (Student Abstract)
We describe our methodology for classifying ASL (American Sign Language) gestures. Rather than operate directly on raw images of hand gestures, we extract coor-dinates and render wireframes from individual images to construct a curated training dataset. This dataset is then used in a classifier that is memory efficient and provides effective performance (94% accuracy). Because we con-struct wireframes that contain information about several angles in the joints that comprise hands, our methodolo-gy is amenable to training those interested in learning ASL by identifying targeted errors in their hand gestures.
more »
« less
- Award ID(s):
- 2303019
- PAR ID:
- 10537709
- Publisher / Repository:
- AAAI Press
- Date Published:
- Journal Name:
- Proceedings of the AAAI Conference on Artificial Intelligence
- Volume:
- 38
- Issue:
- 21
- ISSN:
- 2159-5399
- Page Range / eLocation ID:
- 23606 to 23607
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
User authentication is an important security mechanism to prevent unauthorized accesses to systems or devices. In this paper, we propose a new user authentication method based on surface electromyogram (sEMG) images of hand gestures and deep anomaly detection. Multi-channel sEMG signals acquired during the user performing a hand gesture are converted into sEMG images which are used as the input of a deep anomaly detection model to classify the user as client or imposter. The performance of different sEMG image generation methods in three authentication test scenarios are investigated by using a public hand gesture sEMG dataset. Our experimental results demonstrate the viability of the proposed method for user authentication.more » « less
-
IEEE (Ed.)Over past few years, unmanned aircraft vehicles (UAVs) have been becoming more and more popular for various purposes such as surveillance, automated industry, robotics, vehicle guidance, traffic monitoring and control system. It is very important to have multiple methods of UAVs controlling to fit in UAVs usages. The goal of this work was to develop a new technique to control an UAV by using different hand gestures. To achieve this, a hand keypoint detection algorithm was used to detect 21 keypoints in the hand. Then this keypoints were used as the input to an intelligent system based on Convolutional Neural Networks (CNN) that was able to classify the hand gestures. To capture the hand gestures, the video camera of the UAV was used. A database containing 2400 hand images was created and used to train the CNN. The database contained 8 different hand gestures that were selected to send specific motion commands to the UAV. The accuracy of the CNN to classify the hand gestures was 93%. To test the capabilities of our intelligent control system, a small UAV, the DJI Ryze Tello drone, was used. The experimental results demonstrated that the DJI Tello drone was able to be successfully controlled by hand gestures in real time.more » « less
-
Accurately reconstructing 3D hand poses is a pivotal element for numerous Human-Computer Interaction applications. In this work, we propose SonicHand, the first smartphone-based 3D hand pose reconstruction system using purely inaudible acoustic signals. SonicHand incorporates signal processing techniques and a deep learning framework to address a series of challenges. First, it encodes the topological information of the hand skeleton as prior knowledge and utilizes a deep learning model to realistically and smoothly reconstruct the hand poses. Second, the system employs adversarial training to enhance the generalization ability of our system to be deployed in a new environment or for a new user. Third, we adopt a hand tracking method based on channel impulse response estimation. It enables our system to handle the scenario where the hand performs gestures while moving arbitrarily as a whole. We conduct extensive experiments on a smartphone testbed to demonstrate the effectiveness and robustness of our system from various dimensions. The experiments involve 10 subjects performing up to 12 different hand gestures in three distinctive environments. When the phone is held in one of the user’s hands, the proposed system can track joints with an average error of 18.64 mm.more » « less
-
null (Ed.)Deaf spaces are unique indoor environments designed to optimize visual communication and Deaf cultural expression. However, much of the technological research geared towards the deaf involve use of video or wearables for American sign language (ASL) translation, with little consideration for Deaf perspective on privacy and usability of the technology. In contrast to video, RF sensors offer the avenue for ambient ASL recognition while also preserving privacy for Deaf signers. Methods: This paper investigates the RF transmit waveform parameters required for effective measurement of ASL signs and their effect on word-level classification accuracy attained with transfer learning and convolutional autoencoders (CAE). A multi-frequency fusion network is proposed to exploit data from all sensors in an RF sensor network and improve the recognition accuracy of fluent ASL signing. Results: For fluent signers, CAEs yield a 20-sign classification accuracy of %76 at 77 GHz and %73 at 24 GHz, while at X-band (10 Ghz) accuracy drops to 67%. For hearing imitation signers, signs are more separable, resulting in a 96% accuracy with CAEs. Further, fluent ASL recognition accuracy is significantly increased with use of the multi-frequency fusion network, which boosts the 20-sign fluent ASL recognition accuracy to 95%, surpassing conventional feature level fusion by 12%. Implications: Signing involves finer spatiotemporal dynamics than typical hand gestures, and thus requires interrogation with a transmit waveform that has a rapid succession of pulses and high bandwidth. Millimeter wave RF frequencies also yield greater accuracy due to the increased Doppler spread of the radar backscatter. Comparative analysis of articulation dynamics also shows that imitation signing is not representative of fluent signing, and not effective in pre-training networks for fluent ASL classification. Deep neural networks employing multi-frequency fusion capture both shared, as well as sensor-specific features and thus offer significant performance gains in comparison to using a single sensor or feature-level fusion.more » « less
An official website of the United States government

