skip to main content


Title: Action recognition in manufacturing assembly using multimodal sensor fusion
Production innovations are occurring faster than ever. Manufacturing workers thus need to frequently learn new methods and skills. In fast changing, largely uncertain production systems, manufacturers with the ability to comprehend workers' behavior and assess their operation performance in near real-time will achieve better performance than peers. Action recognition can serve this purpose. Despite that human action recognition has been an active field of study in machine learning, limited work has been done for recognizing worker actions in performing manufacturing tasks that involve complex, intricate operations. Using data captured by one sensor or a single type of sensor to recognize those actions lacks reliability. The limitation can be surpassed by sensor fusion at data, feature, and decision levels. This paper presents a study that developed a multimodal sensor system and used sensor fusion methods to enhance the reliability of action recognition. One step in assembling a Bukito 3D printer, which composed of a sequence of 7 actions, was used to illustrate and assess the proposed method. Two wearable sensors namely Myo-armband captured both Inertial Measurement Unit (IMU) and electromyography (EMG) signals of assembly workers. Microsoft Kinect, a vision based sensor, simultaneously tracked predefined skeleton joints of them. The collected IMU, EMG, and skeleton data were respectively used to train five individual Convolutional Neural Network (CNN) models. Then, various fusion methods were implemented to integrate the prediction results of independent models to yield the final prediction. Reasons for achieving better performance using sensor fusion were identified from this study.  more » « less
Award ID(s):
1646162
NSF-PAR ID:
10129790
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
The 25th International Conference on Production Research (ICPR’19).
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Research on robotic lower-limb assistive devices over the past decade has generated autonomous, multiple degree-of-freedom devices to augment human performance during a variety of scenarios. However, the increase in capabilities of these devices is met with an increase in the complexity of the overall control problem and requirement for an accurate and robust sensing modality for intent recognition. Due to its ability to precede changes in motion, surface electromyography (EMG) is widely studied as a peripheral sensing modality for capturing features of muscle activity as an input for control of powered assistive devices. In order to capture features that contribute to muscle contraction and joint motion beyond muscle activity of superficial muscles, researchers have introduced sonomyography, or real-time dynamic ultrasound imaging of skeletal muscle. However, the ability of these sonomyography features to continuously predict multiple lower-limb joint kinematics during widely varying ambulation tasks, and their potential as an input for powered multiple degree-of-freedom lower-limb assistive devices is unknown. The objective of this research is to evaluate surface EMG and sonomyography, as well as the fusion of features from both sensing modalities, as inputs to Gaussian process regression models for the continuous estimation of hip, knee and ankle angle and velocity during level walking, stair ascent/descent and ramp ascent/descent ambulation. Gaussian process regression is a Bayesian nonlinear regression model that has been introduced as an alternative to musculoskeletal model-based techniques. In this study, time-intensity features of sonomyography on both the anterior and posterior thigh along with time-domain features of surface EMG from eight muscles on the lower-limb were used to train and test subject-dependent and task-invariant Gaussian process regression models for the continuous estimation of hip, knee and ankle motion. Overall, anterior sonomyography sensor fusion with surface EMG significantly improved estimation of hip, knee and ankle motion for all ambulation tasks (level ground, stair and ramp ambulation) in comparison to surface EMG alone. Additionally, anterior sonomyography alone significantly improved errors at the hip and knee for most tasks compared to surface EMG. These findings help inform the implementation and integration of volitional control strategies for robotic assistive technologies.

     
    more » « less
  2. Li-Jessen, Nicole Yee-Key (Ed.)
    The Earable device is a behind-the-ear wearable originally developed to measure cognitive function. Since Earable measures electroencephalography (EEG), electromyography (EMG), and electrooculography (EOG), it may also have the potential to objectively quantify facial muscle and eye movement activities relevant in the assessment of neuromuscular disorders. As an initial step to developing a digital assessment in neuromuscular disorders, a pilot study was conducted to determine whether the Earable device could be utilized to objectively measure facial muscle and eye movements intended to be representative of Performance Outcome Assessments, (PerfOs) with tasks designed to model clinical PerfOs, referred to as mock-PerfO activities. The specific aims of this study were: To determine whether the Earable raw EMG, EOG, and EEG signals could be processed to extract features describing these waveforms; To determine Earable feature data quality, test re-test reliability, and statistical properties; To determine whether features derived from Earable could be used to determine the difference between various facial muscle and eye movement activities; and, To determine what features and feature types are important for mock-PerfO activity level classification. A total of N = 10 healthy volunteers participated in the study. Each study participant performed 16 mock-PerfOs activities, including talking, chewing, swallowing, eye closure, gazing in different directions, puffing cheeks, chewing an apple, and making various facial expressions. Each activity was repeated four times in the morning and four times at night. A total of 161 summary features were extracted from the EEG, EMG, and EOG bio-sensor data. Feature vectors were used as input to machine learning models to classify the mock-PerfO activities, and model performance was evaluated on a held-out test set. Additionally, a convolutional neural network (CNN) was used to classify low-level representations of the raw bio-sensor data for each task, and model performance was correspondingly evaluated and compared directly to feature classification performance. The model’s prediction accuracy on the Earable device’s classification ability was quantitatively assessed. Study results indicate that Earable can potentially quantify different aspects of facial and eye movements and may be used to differentiate mock-PerfO activities. Specially, Earable was found to differentiate talking, chewing, and swallowing tasks from other tasks with observed F1 scores >0.9. While EMG features contribute to classification accuracy for all tasks, EOG features are important for classifying gaze tasks. Finally, we found that analysis with summary features outperformed a CNN for activity classification. We believe Earable may be used to measure cranial muscle activity relevant for neuromuscular disorder assessment. Classification performance of mock-PerfO activities with summary features enables a strategy for detecting disease-specific signals relative to controls, as well as the monitoring of intra-subject treatment responses. Further testing is needed to evaluate the Earable device in clinical populations and clinical development settings. 
    more » « less
  3. Construction tasks involve various activities composed of one or more body motions. It is essential to understand the dynamically changing behavior and state of construction workers to manage construction workers effectively with regards to their safety and productivity. While several research efforts have shown promising results in activity recognition, further research is still necessary to identify the best locations of motion sensors on a worker’s body by analyzing the recognition results for improving the performance and reducing the implementation cost. This study proposes a simulation-based evaluation of multiple motion sensors attached to workers performing typical construction tasks. A set of 17 inertial measurement unit (IMU) sensors is utilized to collect motion sensor data from an entire body. Multiple machine learning algorithms are utilized to classify the motions of the workers by simulating several scenarios with different combinations and features of the sensors. Through the simulations, each IMU sensor placed in different locations of a body is tested to evaluate its recognition accuracy toward the worker’s different activity types. Then, the effectiveness of sensor locations is measured regarding activity recognition performance to determine relative advantage of each location. Based on the results, the required number of sensors can be reduced maintaining the recognition performance. The findings of this study can contribute to the practical implementation of activity recognition using simple motion sensors to enhance the safety and productivity of individual workers. 
    more » « less
  4. This study aims at sensing and understanding the worker’s activity in a human-centered intelligent manufacturing system. We propose a novel multi-modal approach for worker activity recognition by leveraging information from different sensors and in different modalities. Specifically, a smart armband and a visual camera are applied to capture Inertial Measurement Unit (IMU) signals and videos, respectively. For the IMU signals, we design two novel feature transform mechanisms, in both frequency and spatial domains, to assemble the captured IMU signals as images, which allow using convolutional neural networks to learn the most discriminative features. Along with the above two modalities, we propose two other modalities for the video data, i.e., at the video frame and video clip levels. Each of the four modalities returns a probability distribution on activity prediction. Then, these probability distributions are fused to output the worker activity classification result. A worker activity dataset is established, which at present contains 6 common activities in assembly tasks, i.e., grab a tool/part, hammer a nail, use a power-screwdriver, rest arms, turn a screwdriver, and use a wrench. The developed multi-modal approach is evaluated on this dataset and achieves recognition accuracies as high as 97% and 100% in the leave-one-out and half-half experiments, respectively. 
    more » « less
  5. Activity recognition is a crucial aspect in smart manufacturing and human-robot collaboration, as robots play a vital role in improving efficiency and safety by accurately recognizing human intentions and proactively assisting with tasks. Current human intention recognition applications only consider the accuracy of recognition but ignore the importance of predicting it in advance. Given human reaching movements, we want to equip the robot with the ability to predict human intent not only with precise recognition but also at an early stage. In this paper, we propose a framework to apply Transformer-based and LSTM-based models to learn motion intentions. Second, based on the observation of distances of human joints along the motion trajectory, we explore how we can use the hidden Markov model to find intent state transitions, i.e., intent uncertainty and intent certainty. Finally, two data types are generated, one for the full data and the other for the length of data before state transitions; both data are evaluated on models to assess the robustness of intention prediction. We conducted experiments in a manufacturing workspace where the experimenter reaches multiple scattered targets and further this experimental scenario was designed to examine how intents differ, but motions are only slightly different. The proposed models were then evaluated with experimental data, and further performance comparisons were made between models and between different intents. Finally, early predictions were validated to be better than using full-length data. 
    more » « less