skip to main content

Title: Activity recognition in manufacturing: The roles of motion capture and sEMG+inertial wearables in detecting fine vs. gross motion
In safety-critical environments, robots need to reliably recognize human activity to be effective and trust-worthy partners. Since most human activity recognition (HAR) approaches rely on unimodal sensor data (e.g. motion capture or wearable sensors), it is unclear how the relationship between the sensor modality and motion granularity (e.g. gross or fine) of the activities impacts classification accuracy. To our knowledge, we are the first to investigate the efficacy of using motion capture as compared to wearable sensor data for recognizing human motion in manufacturing settings. We introduce the UCSD-MIT Human Motion dataset, composed of two assembly tasks that entail either gross or fine-grained motion. For both tasks, we compared the accuracy of a Vicon motion capture system to a Myo armband using three widely used HAR algorithms. We found that motion capture yielded higher accuracy than the wearable sensor for gross motion recognition (up to 36.95%), while the wearable sensor yielded higher accuracy for fine-grained motion (up to 28.06%). These results suggest that these sensor modalities are complementary, and that robots may benefit from systems that utilize multiple modalities to simultaneously, but independently, detect gross and fine-grained motion. Our findings will help guide researchers in numerous fields of robotics including more » learning from demonstration and grasping to effectively choose sensor modalities that are most suitable for their applications. « less
; ; ;
Award ID(s):
1724982 1734482
Publication Date:
Journal Name:
2019 International Conference on Robotics and Automation (ICRA)
Page Range or eLocation-ID:
6533 to 6539
Sponsoring Org:
National Science Foundation
More Like this
  1. Human activity recognition (HAR) is growing in popularity due to its wide-ranging applications in patient rehabilitation and movement disorders. HAR approaches typically start with collecting sensor data for the activities under consideration and then develop algorithms using the dataset. As such, the success of algorithms for HAR depends on the availability and quality of datasets. Most of the existing work on HAR uses data from inertial sensors on wearable devices or smartphones to design HAR algorithms. However, inertial sensors exhibit high noise that makes it difficult to segment the data and classify the activities. Furthermore, existing approaches typically do not make their data available publicly, which makes it difficult or impossible to obtain comparisons of HAR approaches. To address these issues, we present wearable HAR (w-HAR) which contains labeled data of seven activities from 22 users. Our dataset’s unique aspect is the integration of data from inertial and wearable stretch sensors, thus providing two modalities of activity information. The wearable stretch sensor data allows us to create variable-length segment data and ensure that each segment contains a single activity. We also provide a HAR framework to use w-HAR to classify the activities. To this end, we first perform a designmore »space exploration to choose a neural network architecture for activity classification. Then, we use two online learning algorithms to adapt the classifier to users whose data are not included at design time. Experiments on the w-HAR dataset show that our framework achieves 95% accuracy while the online learning algorithms improve the accuracy by as much as 40%.« less
  2. Human activity recognition (HAR) from wearable sensor data has recently gained widespread adoption in a number of fields. However, recognizing complex human activities, postural and rhythmic body movements (e.g., dance, sports) is challenging due to the lack of domain-specific labeling information, the perpetual variability in human movement kinematics profiles due to age, sex, dexterity and the level of professional training. In this paper, we propose a deep activity recognition model to work with limited labeled data, both for simple and complex human activities. To mitigate the intra- and inter-user spatio-temporal variability of movements, we posit novel data augmentation and domain normalization techniques. We depict a semi-supervised technique that learns noise and transformation invariant feature representation from sparsely labeled data to accommodate intra-personal and inter-user variations of human movement kinematics. We also postulate a transfer learning approach to learn domain invariant feature representations by minimizing the feature distribution distance between the source and target domains. We showcase the improved performance of our proposed framework, AugToAct, using a public HAR dataset. We also design our own data collection, annotation and experimental setup on complex dance activity recognition steps and kinematics movements where we achieved higher performance metrics with limited label data comparedmore »to simple activity recognition tasks.« less
  3. We explore the effect of auxiliary labels in improving the classification accuracy of wearable sensor-based human activity recognition (HAR) systems, which are primarily trained with the supervision of the activity labels (e.g. running, walking, jumping). Supplemental meta-data are often available during the data collection process such as body positions of the wearable sensors, subjects' demographic information (e.g. gender, age), and the type of wearable used (e.g. smartphone, smart-watch). This information, while not directly related to the activity classification task, can nonetheless provide auxiliary supervision and has the potential to significantly improve the HAR accuracy by providing extra guidance on how to handle the introduced sample heterogeneity from the change in domains (i.e positions, persons, or sensors), especially in the presence of limited activity labels. However, integrating such meta-data information in the classification pipeline is non-trivial - (i) the complex interaction between the activity and domain label space is hard to capture with a simple multi-task and/or adversarial learning setup, (ii) meta-data and activity labels might not be simultaneously available for all collected samples. To address these issues, we propose a novel framework Conditional Domain Embeddings (CoDEm). From the available unlabeled raw samples and their domain meta-data, we first learn amore »set of domain embeddings using a contrastive learning methodology to handle inter-domain variability and inter-domain similarity. To classify the activities, CoDEm then learns the label embeddings in a contrastive fashion, conditioned on domain embeddings with a novel attention mechanism, enforcing the model to learn the complex domain-activity relationships. We extensively evaluate CoDEm in three benchmark datasets against a number of multi-task and adversarial learning baselines and achieve state-of-the-art performance in each avenue.« less
  4. In this work, we present a novel non-visual HAR system that achieves state-of-the-art performance on realistic SCE tasks via a single wearable sensor. We leverage surface electromyography and inertial data from a low-profile wearable sensor to attain performant robot perception while remaining unobtrusive and user-friendly. By capturing both convolutional and temporal features with a hybrid CNN-LSTM classifier, our system is able to robustly and effectively classify complex, full-body human activities with only this single sensor. We perform a rigorous analysis of our method on two datasets representative of SCE tasks, and compare performance with several prominent HAR algorithms. Results show our system substantially outperforms rival algorithms in identifying complex human tasks from minimal sensing hardware, achieving F1-scores up to 84% over 31 strenuous activity classes. To our knowledge, we are the first to robustly identify complex full-body tasks using a single, unobtrusive sensor feasible for real-world use in SCEs. Using our approach, robots will be able to more reliably understand human activity, enabling them to safely navigate sensitive, crowded spaces.
  5. Functional connectivity between the brain and body kinematics has largely not been investigated due to the requirement of motionlessness in neuroimaging techniques such as functional magnetic resonance imaging (fMRI). However, this connectivity is disrupted in many neurodegenerative disorders, including Parkinson’s Disease (PD), a neurological progressive disorder characterized by movement symptoms including slowness of movement, stiffness, tremors at rest, and walking and standing instability. In this study, brain activity is recorded through functional near-infrared spectroscopy (fNIRS) and electroencephalography (EEG), and body kinematics were captured by a motion capture system (Mocap) based on an inertial measurement unit (IMU) for gross movements (large movements such as limb kinematics), and the WearUp glove for fine movements (small range movements such as finger kinematics). PD and neurotypical (NT) participants were recruited to perform 8 different movement tasks. The recorded data from each modality have been analyzed individually, and the processed data has been used for classification between the PD and NT groups. The average changes in oxygenated hemoglobin (HbO2) from fNIRS, EEG power spectral density in the Theta, Alpha, and Beta bands, acceleration vector from Mocap, and normalized WearUp flex sensor data were used for classification. 12 different support vector machine (SVM) classifiers have beenmore »used on different datasets such as only fNIRS data, only EEG data, hybrid fNIRS/EEG data, and all the fused data for two classification scenarios: classifying PD and NT based on individual activities, and all activity data fused together. The PD and NT group could be distinguished with more than 83% accuracy for each individual activity. For all the fused data, the PD and NT groups are classified with 81.23%, 92.79%, 92.27%, and 93.40% accuracy for the fNIRS only, EEG only, hybrid fNIRS/EEG, and all fused data, respectively. The results indicate that the overall performance of classification in distinguishing PD and NT groups improves when using both brain and body data.« less