In this paper, we propose a machine learning-based multi-stream framework to recognize American Sign Language (ASL) manual signs and nonmanual gestures (face and head movements) in real time from RGB-D videos. Our approach is based on 3D Convolutional Neural Networks (3D CNNs) by fusing the multi-modal features including hand gestures, facial expressions, and body poses from multiple channels (RGB, Depth, Motion, and Skeleton joints). To learn the overall temporal dynamics in a video, a proxy video is generated by selecting a subset of frames for each video which are then used to train the proposed 3D CNN model. We collected a new ASL dataset, ASL-100-RGBD, which contains 42 RGB-D videos captured by a Microsoft Kinect V2 camera. Each video consists of 100 ASL manual signs, along with RGB channel, Depth maps, Skeleton joints, Face features, and HD face. The dataset is fully annotated for each semantic region (i.e. the time duration of each sign that the human signer performs). Our proposed method achieves 92.88% accuracy for recognizing 100 ASL sign glosses in our newly collected ASL-100-RGBD dataset. The effectiveness of our framework for recognizing hand gestures from RGB-D videos is further demonstrated on a large-scale dataset, ChaLearn IsoGD, achieving the state-of-the-art results.
more »
« less
Towards Smartphone-based 3D Hand Pose Reconstruction Using Acoustic Signals
Accurately reconstructing 3D hand poses is a pivotal element for numerous Human-Computer Interaction applications. In this work, we propose SonicHand, the first smartphone-based 3D hand pose reconstruction system using purely inaudible acoustic signals. SonicHand incorporates signal processing techniques and a deep learning framework to address a series of challenges. First, it encodes the topological information of the hand skeleton as prior knowledge and utilizes a deep learning model to realistically and smoothly reconstruct the hand poses. Second, the system employs adversarial training to enhance the generalization ability of our system to be deployed in a new environment or for a new user. Third, we adopt a hand tracking method based on channel impulse response estimation. It enables our system to handle the scenario where the hand performs gestures while moving arbitrarily as a whole. We conduct extensive experiments on a smartphone testbed to demonstrate the effectiveness and robustness of our system from various dimensions. The experiments involve 10 subjects performing up to 12 different hand gestures in three distinctive environments. When the phone is held in one of the user’s hands, the proposed system can track joints with an average error of 18.64 mm.
more »
« less
- Award ID(s):
- 2154059
- PAR ID:
- 10647829
- Publisher / Repository:
- ACM Transactions on Sensor Networks
- Date Published:
- Journal Name:
- ACM Transactions on Sensor Networks
- Volume:
- 20
- Issue:
- 5
- ISSN:
- 1550-4859
- Page Range / eLocation ID:
- 1 to 32
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Hand-to-Face transmission has been estimated to be a minority, yet non-negligible, vector of COVID-19 transmission and a major vector for multiple other pathogens. At the same time, as it cannot be effectively addressed with mainstream protection measures, such as wearing masks or tracing contacts, it remains largely untackled. To help address this issue, we have developed Saving Face - an app that alerts users when they are about to touch their faces, by analyzing the distortion patterns in the ultrasound signal emitted by their earphones. The system only relies on pre-existing hardware (a smartphone with generic earphones), which allows it to be rapidly scalable to billions of smartphone users worldwide. This paper describes the design, implementation and evaluation of the system, as well as the results of a user study testing the solution's accuracy, robustness, and user experience during various day-to-day activities (93.7% Sensitivity and 91.5% Precision, N=10). While this paper focuses on the system's application to detecting hand-to-face gestures, the technique can also be applicable to other types of gestures and gesture-based applications.more » « less
-
The popularity of smartphones has grown at an unprecedented rate, which makes smartphone based imaging especially appealing. In this paper, we develop a novel acoustic imaging system using only an off-the-shelf smartphone. It is an attractive alternative to camera based imaging under darkness and obstruction. Our system is based on Synthetic Aperture Radar (SAR). To image an object, a user moves a phone along a predefined trajectory to mimic a virtual sensor array. SAR based imaging poses several new challenges in our context, including strong self and background interference, deviation from the desired trajectory due to hand jitters, and severe speaker/microphone distortion. We address these challenges by developing a 2-stage interference cancellation scheme, a new algorithm to compensate trajectory errors, and an effective method to minimize the impact of signal distortion. We implement a proof- of-concept system on Samsung S7. Our results demonstrate the feasibility and effectiveness of acoustic imaging on a mobile.more » « less
-
In this paper, we explore quick 3D shape composition during early-phase spatial design ideation. Our approach is to re-purpose a smartphone as a hand-held reference plane for creating, modifying, and manipulating 3D sweep surfaces. We implemented MobiSweep, a prototype application to explore a new design space of constrained spatial interactions that combine direct orientation control with indirect position control via well-established multi-touch gestures. MobiSweep leverages kinesthetically aware interactions for the creation of a sweep surface without explicit position tracking. The design concepts generated by users, in conjunction with their feedback, demonstrate the potential of such interactions in enabling spatial ideation.more » « less
-
We present Ring-a-Pose, a single untethered ring that tracks continuous 3D hand poses. Located in the center of the hand, the ring emits an inaudible acoustic signal that each hand pose reflects differently. Ring-a-Pose imposes minimal obtrusions on the hand, unlike multi-ring or glove systems. It is not affected by the choice of clothing that may cover wrist-worn systems. In a series of three user studies with a total of 36 participants, we evaluate Ring-a-Pose's performance on pose tracking and micro-finger gesture recognition. Without collecting any training data from a user, Ring-a-Pose tracks continuous hand poses with a joint error of 14.1mm. The joint error decreases to 10.3mm for fine-tuned user-dependent models. Ring-a-Pose recognizes 7-class micro-gestures with a 90.60% and 99.27% accuracy for user-independent and user-dependent models, respectively. Furthermore, the ring exhibits promising performance when worn on any finger. Ring-a-Pose enables the future of smart rings to track and recognize hand poses using relatively low-power acoustic sensing.more » « less
An official website of the United States government

