skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Comparison of Human Skeleton Trackers Paired with A Novel Skeleton Fusion Algorithm
The onset of Industry 4.0 brings a greater demand for Human-Robot Collaboration (HRC) in manufacturing. This has led to a critical need for bridging the sensing and AI with the mechanical-n-physical necessities to successfully augment the robot’s awareness and intelligence. In a HRC work cell, options for sensors to detect human joint locations vary greatly in complexity, usability, and cost. In this paper, the use of depth cameras is explored, since they are a relatively low-cost option that does not require users to wear extra sensing hardware. Herein, the Google Media Pipe (BlazePose) and OpenPose skeleton tracking software packages are used to estimate the pixel coordinates of each human joint in images from depth cameras. The depth at each pixel is then used with the joint pixel coordinates to generate the 3D joint locations of the skeleton. In comparing these skeleton trackers, this paper also presents a novel method of combining the skeleton that the trackers generate from each camera’s data utilizing a quaternion/link-length representation of the skeleton. Results show that the overall mean and standard deviation in position error between the fused skeleton and target locations was lower compared to the skeletons resulting directly from each camera’s data.  more » « less
Award ID(s):
1830383
PAR ID:
10352741
Author(s) / Creator(s):
Date Published:
Journal Name:
ASME Manufacturing Science and Engineering Conference
Page Range / eLocation ID:
10
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Human skeleton data provides a compact, low noise representation of relative joint locations that may be used in human identity and activity recognition. Hierarchical Co-occurrence Network (HCN) has been used for human activity recognition because of its ability to consider correlation between joints in convolutional operations in the network. HCN shows good identification accuracy but requires a large number of samples to train. Acquisition of this large-scale data can be time consuming and expensive, motivating synthetic skeleton data generation for data augmentation in HCN. We propose a novel method that integrates an Auxiliary Classifier Generative Adversarial Network (AC-GAN) and HCN hybrid framework for Assessment and Augmented Identity Recognition for Skeletons (AAIRS). The proposed AAIRS method performs generation and evaluation of synthetic 3-dimensional motion capture skeleton videos followed by human identity recognition. Synthetic skeleton data produced by the generator component of the AC-GAN is evaluated using an Inception Score-inspired realism metric computed from the HCN classifier outputs. We study the effect of increasing the percentage of synthetic samples in the training set on HCN performance. Before synthetic data augmentation, we achieve 74.49% HCN performance in 10-fold cross validation for 9-class human identification. With a synthetic-real mixture of 50%-50%, we achieve 78.22% mean accuracy, significantly 
    more » « less
  2. null (Ed.)
    Lensless imaging is a new, emerging modality where image sensors utilize optical elements in front of the sensor to perform multiplexed imaging. There have been several recent papers to reconstruct images from lensless imagers, including methods that utilize deep learning for state-of-the-art performance. However, many of these methods require explicit knowledge of the optical element, such as the point spread function, or learn the reconstruction mapping for a single fixed PSF. In this paper, we explore a neural network architecture that performs joint image reconstruction and PSF estimation to robustly recover images captured with multiple PSFs from different cameras. Using adversarial learning, this approach achieves improved reconstruction results that do not require explicit knowledge of the PSF at test-time and shows an added improvement in the reconstruction model’s ability to generalize to variations in the camera’s PSF. This allows lensless cameras to be utilized in a wider range of applications that require multiple cameras without the need to explicitly train a separate model for each new camera. 
    more » « less
  3. Closed-loop state estimators that track the movements and behaviors of large-scale populations have significant potential to benefit emergency teams during the critical early stages of disaster response. Such population trackers could enable insight about the population even where few direct measurements are available. In concept, a population tracker might be realized using a Bayesian estimation framework to fuse agent-based models of human movement and behavior with sparse sensing, such as a small set of cameras providing population counts at specific locations. We describe a simple proof-of-concept for such an estimator by applying a particle-filter to synthetic sensor data generated from a small simulated environment. An interesting result is that behavioral models embedded in the particle filter make it possible to distinguish among simulated agents, even when the only available sensor data are aggregate population counts at specific locations. 
    more » « less
  4. Recent developments in markerless tracking software such as DeepLabCut (DLC) allow estimation of skin landmark positions during behavioral studies. However, studies that require highly accurate skeletal kinematics require estimation of 3D positions of subdermal landmarks such as joint centers of rotation or skeletal features. In many animals, significant slippage between the skin and underlying skeleton makes accurate tracking of skeletal configuration from skin landmarks difficult. While biplanar, high-speed X-ray acquisition cameras offer a way to measure accurate skeletal configuration using tantalum markers and XROMM, this technology is expensive, not widely available, and the manual annotation required is time-consuming. Here, we present an approach that utilizes DLC to estimate subdermal landmarks in a rat from video collected from two standard cameras. By simultaneously recording X-ray and live video of an animal, we train a DLC model to predict the skin locations representing the projected positions of subdermal landmarks obtained from X-ray data. Predicted skin locations from multiple camera views were triangulated to reconstruct depth-accurate positions of subdermal landmarks. We found that DLC was able to estimate skeletal landmarks with good 3D accuracy, suggesting that this might be an approach to provide accurate estimates of skeletal configuration using standard live video. 
    more » « less
  5. One of the grand challenges in computer vision is to recover 3D poses and shapes of multiple human bodies with absolute scales from a single RGB image. The challenge stems from the inherent depth and scale ambiguity from a single view. The state of the art on 3D human pose and shape estimation mainly focuses on estimating the 3D joint locations relative to the root joint, defined as the pelvis joint. In this paper, a novel approach called Absolute-ROMP is proposed, which builds upon a one-stage multi-person 3D mesh predictor network, ROMP, to estimate multi-person 3D poses and shapes, but with absolute scales from a single RGB image. To achieve this, we introduce absolute root joint localization in the camera coordinate frame, which enables the estimation of 3D mesh coordinates of all persons in the image and their root joint locations normalized by the focal point. Moreover, a CNN and transformer hybrid network, called TransFocal, is proposed to predict the focal length of the image’s camera. This enables Absolute-ROMP to obtain absolute depth information of all joints in the camera coordinate frame, further improving the accuracy of our proposed method. The Absolute-ROMP is evaluated on the root joint localization and root-relative 3D pose estimation tasks on publicly available multi-person 3D pose datasets, and TransFocal is evaluated on a dataset created from the Pano360 dataset. Our proposed approach achieves state-of-the-art results on these tasks, outperforming existing methods or has competitive performance. Due to its real-time performance, our method is applicable to in-the-wild images and videos. 
    more » « less