skip to main content

Title: Transferable Two-Stream Convolutional Neural Network for Human Action Recognition
Human-Robot Collaboration (HRC), which envisions a workspace in which human and robot can dynamically collaborate, has been identified as a key element in smart manufacturing. Human action recognition plays a key role in the realization of HRC as it helps identify current human action and provides the basis for future action prediction and robot planning. Despite recent development of Deep Learning (DL) that has demonstrated great potential in advancing human action recognition, one of the key issues remains as how to effectively leverage the temporal information of human motion to improve the performance of action recognition. Furthermore, large volume of training data is often difficult to obtain due to manufacturing constraints, which poses challenge for the optimization of DL models. This paper presents an integrated method based on optical flow and convolutional neural network (CNN)-based transfer learning to tackle these two issues. First, optical flow images, which encode the temporal information of human motion, are extracted and serve as the input to a two-stream CNN structure for simultaneous parsing of spatial-temporal information of human motion. Then, transfer learning is investigated to transfer the feature extraction capability of a pretrained CNN to manufacturing scenarios. Evaluation using engine block assembly confirmed the more » effectiveness of the developed method. « less
Authors:
Award ID(s):
1830295
Publication Date:
NSF-PAR ID:
10189094
Journal Name:
48th North American Manufacturing Research Conference
Sponsoring Org:
National Science Foundation
More Like this
  1. Human-Robot Collaboration (HRC), which enables a workspace where human and robot can dynamically and safely collaborate for improved operational efficiency, has been identified as a key element in smart manu­facturing. Human action recognition plays a key role in the realization ofHRC, as it helps identify current human action and provides the basis for future action prediction and robot planning. While Deep Learning (DL) has demonstrated great potential in advancing human action recognition, effectively leveraging the temporal in­formation of human motions to improve the accuracy and robustness of action recognition has remained as a challenge. Furthermore, it is often difficult tomore »obtain a large volume of data for DL network training and opti­mization, due to operational constraints in a realistic manufacturing setting. This paper presents an integrated method to address these two challenges, based on the optical flow and convolutional neural network (CNN)­based transfer learning. Specifically, optical flow images, which encode the temporal information of human motion, are extracted and serve as the input to a two-stream CNN structure for simultaneous parsing of spatial­-temporal information of human motion. Subsequently, transfer learning is investigated to transfer the feature extraction capability of a pre-trained CNN to manufacturing scenarios. Evaluation using engine block assembly confirmed the effectiveness of the developed method.« less
  2. Abstract

    With the development of industrial automation and artificial intelligence, robotic systems are developing into an essential part of factory production, and the human-robot collaboration (HRC) becomes a new trend in the industrial field. In our previous work, ten dynamic gestures have been designed for communication between a human worker and a robot in manufacturing scenarios, and a dynamic gesture recognition model based on Convolutional Neural Networks (CNN) has been developed. Based on the model, this study aims to design and develop a new real-time HRC system based on multi-threading method and the CNN. This system enables the real-time interactionmore »between a human worker and a robotic arm based on dynamic gestures. Firstly, a multi-threading architecture is constructed for high-speed operation and fast response while schedule more than one task at the same time. Next, A real-time dynamic gesture recognition algorithm is developed, where a human worker’s behavior and motion are continuously monitored and captured, and motion history images (MHIs) are generated in real-time. The generation of the MHIs and their identification using the classification model are synchronously accomplished. If a designated dynamic gesture is detected, it is immediately transmitted to the robotic arm to conduct a real-time response. A Graphic User Interface (GUI) for the integration of the proposed HRC system is developed for the visualization of the real-time motion history and classification results of the gesture identification. A series of actual collaboration experiments are carried out between a human worker and a six-degree-of-freedom (6 DOF) Comau industrial robot, and the experimental results show the feasibility and robustness of the proposed system.

    « less
  3. Human action recognition is an important topic in artificial intelligence with a wide range of applications including surveillance systems, search-and-rescue operations, human-computer interaction, etc. However, most of the current action recognition systems utilize videos captured by stationary cameras. Another emerging technology is the use of unmanned ground and aerial vehicles (UAV/UGV) for different tasks such as transportation, traffic control, border patrolling, wild-life monitoring, etc. This technology has become more popular in recent years due to its affordability, high maneuverability, and limited human interventions. However, there does not exist an efficient action recognition algorithm for UAV-based monitoring platforms. This paper considersmore »UAV-based video action recognition by addressing the key issues of aerial imaging systems such as camera motion and vibration, low resolution, and tiny human size. In particular, we propose an automated deep learning-based action recognition system which includes the three stages of video stabilization using the SURF feature selection and Lucas-Kanade method, human action area detection using faster region-based convolutional neural networks (R-CNN), and action recognition. We propose a novel structure that extends and modifies the InceptionResNet-v2 architecture by combining a 3D CNN architecture and a residual network for action recognition. We achieve an average accuracy of 85.83% for the entire-video-level recognition when applying our algorithm to the popular UCF-ARG aerial imaging dataset. This accuracy significantly improves upon the state-of-the-art accuracy by a margin of 17%.« less
  4. Abstract

    Human-robot collaboration (HRC) is a challenging task in modern industry and gesture communication in HRC has attracted much interest. This paper proposes and demonstrates a dynamic gesture recognition system based on Motion History Image (MHI) and Convolutional Neural Networks (CNN). Firstly, ten dynamic gestures are designed for a human worker to communicate with an industrial robot. Secondly, the MHI method is adopted to extract the gesture features from video clips and generate static images of dynamic gestures as inputs to CNN. Finally, a CNN model is constructed for gesture recognition. The experimental results show very promising classification accuracy usingmore »this method.

    « less
  5. Production innovations are occurring faster than ever. Manufacturing workers thus need to frequently learn new methods and skills. In fast changing, largely uncertain production systems, manufacturers with the ability to comprehend workers' behavior and assess their operation performance in near real-time will achieve better performance than peers. Action recognition can serve this purpose. Despite that human action recognition has been an active field of study in machine learning, limited work has been done for recognizing worker actions in performing manufacturing tasks that involve complex, intricate operations. Using data captured by one sensor or a single type of sensor to recognizemore »those actions lacks reliability. The limitation can be surpassed by sensor fusion at data, feature, and decision levels. This paper presents a study that developed a multimodal sensor system and used sensor fusion methods to enhance the reliability of action recognition. One step in assembling a Bukito 3D printer, which composed of a sequence of 7 actions, was used to illustrate and assess the proposed method. Two wearable sensors namely Myo-armband captured both Inertial Measurement Unit (IMU) and electromyography (EMG) signals of assembly workers. Microsoft Kinect, a vision based sensor, simultaneously tracked predefined skeleton joints of them. The collected IMU, EMG, and skeleton data were respectively used to train five individual Convolutional Neural Network (CNN) models. Then, various fusion methods were implemented to integrate the prediction results of independent models to yield the final prediction. Reasons for achieving better performance using sensor fusion were identified from this study.« less