- Award ID(s):
- 1910519
- NSF-PAR ID:
- 10356345
- Date Published:
- Journal Name:
- Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
- Volume:
- 5
- Issue:
- 4
- ISSN:
- 2474-9567
- Page Range / eLocation ID:
- 1 to 29
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
This paper presents GoPose, a 3D skeleton-based human pose estimation system that uses WiFi devices at home. Our system leverages the WiFi signals reflected off the human body for 3D pose estimation. In contrast to prior systems that need specialized hardware or dedicated sensors, our system does not require a user to wear or carry any sensors and can reuse the WiFi devices that already exist in a home environment for mass adoption. To realize such a system, we leverage the 2D AoA spectrum of the signals reflected from the human body and the deep learning techniques. In particular, the 2D AoA spectrum is proposed to locate different parts of the human body as well as to enable environment-independent pose estimation. Deep learning is incorporated to model the complex relationship between the 2D AoA spectrums and the 3D skeletons of the human body for pose tracking. Our evaluation results show GoPose achieves around 4.7cm of accuracy under various scenarios including tracking unseen activities and under NLoS scenarios.more » « less
-
The graph convolutional network (GCN) has recently achieved promising performance of 3D human pose estimation (HPE) by modeling the relationship among body parts. However, most prior GCN approaches suffer from two main drawbacks. First, they share a feature transformation for each node within a graph convolution layer. This prevents them from learning different relations between different body joints. Second, the graph is usually defined according to the human skeleton and is suboptimal because human activities often exhibit motion patterns beyond the natural connections of body joints. To address these limitations, we introduce a novel Modulated GCN for 3D HPE. It consists of two main components: weight modulation and affinity modulation. Weight modulation learns different modulation vectors for different nodes so that the feature transformations of different nodes are disentangled while retaining a small model size. Affinity modulation adjusts the graph structure in a GCN so that it can model additional edges beyond the human skeleton. We investigate several affinity modulation methods as well as the impact of regularizations. Rigorous ablation study indicates both types of modulation improve performance with negligible overhead. Compared with state-of-the-art GCNs for 3D HPE, our approach either significantly reduces the estimation errors, e.g., by around 10%, while retaining a small model size or drastically reduces the model size, e.g., from 4.22M to 0.29M (a 14.5× reduction), while achieving comparable performance. Results on two benchmarks show our Modulated GCN outperforms some recent states of the art. Our code is available at https://github.com/ZhimingZo/Modulated-GCN.more » « less
-
null (Ed.)We envision a convenient telepresence system available to users anywhere, anytime. Such a system requires displays and sensors embedded in commonly worn items such as eyeglasses, wristwatches, and shoes. To that end, we present a standalone real-time system for the dynamic 3D capture of a person, relying only on cameras embedded into a head-worn device, and on Inertial Measurement Units (IMUs) worn on the wrists and ankles. Our prototype system egocentrically reconstructs the wearer's motion via learning-based pose estimation, which fuses inputs from visual and inertial sensors that complement each other, overcoming challenges such as inconsistent limb visibility in head-worn views, as well as pose ambiguity from sparse IMUs. The estimated pose is continuously re-targeted to a prescanned surface model, resulting in a high-fidelity 3D reconstruction. We demonstrate our system by reconstructing various human body movements and show that our visual-inertial learning-based method, which runs in real time, outperforms both visual-only and inertial-only approaches. We captured an egocentric visual-inertial 3D human pose dataset publicly available at https://sites.google.com/site/youngwooncha/egovip for training and evaluating similar methods.more » « less
-
In this paper, we propose a machine learning-based multi-stream framework to recognize American Sign Language (ASL) manual signs and nonmanual gestures (face and head movements) in real time from RGB-D videos. Our approach is based on 3D Convolutional Neural Networks (3D CNNs) by fusing the multi-modal features including hand gestures, facial expressions, and body poses from multiple channels (RGB, Depth, Motion, and Skeleton joints). To learn the overall temporal dynamics in a video, a proxy video is generated by selecting a subset of frames for each video which are then used to train the proposed 3D CNN model. We collected a new ASL dataset, ASL-100-RGBD, which contains 42 RGB-D videos captured by a Microsoft Kinect V2 camera. Each video consists of 100 ASL manual signs, along with RGB channel, Depth maps, Skeleton joints, Face features, and HD face. The dataset is fully annotated for each semantic region (i.e. the time duration of each sign that the human signer performs). Our proposed method achieves 92.88% accuracy for recognizing 100 ASL sign glosses in our newly collected ASL-100-RGBD dataset. The effectiveness of our framework for recognizing hand gestures from RGB-D videos is further demonstrated on a large-scale dataset, ChaLearn IsoGD, achieving the state-of-the-art results.
-
Introduction This study is aimed at developing a task-based methodology for the design of robotic exoskeletons. This is in contrast to prevailing research efforts, which attempt to mimic the human limb, where each human joint is given an exoskeleton counter-joint. Rather, we present an alternative systematic design approach for the design of exoskeletons that can follow the complex three-dimensional motions of the human body independent of anatomical measures and landmarks. With this approach, it is not necessary to know the geometry of the targeted limb but rather to have a description of its motion at the point of attachment.
Methods The desired trajectory of the targeted limb has been collected through a motion capture system from a healthy subject. Then, an approximate dimensional synthesis has been employed to specify the size of the mechanism and its location with respect to the limb, while generating the desired trajectory. The procedure for this method, from motion capture to kinematic synthesis to mechanism selection and optimization, is validated with an illustrative example.
Results The proposed method resulted an exoskeleton which follows the desired trajectory of the human limb without any need of aligning its joint to the corresponding human joints.
Conclusion A method to design lower mobility exoskeletons for specific sets of human motion is presented; the approach result an exoskeleton with lesser actuation system while generating complex 3D limb motions, which in turn results a lighter exoskeletons. It also avoids a need to align each robotic joint axis with its human counterpart.