skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Unsupervised Human Activity Recognition Learning for Disassembly Tasks
Large volumes of used electronics are often collected in remanufacturing plants, which requires disassembly before harvesting parts for reuse. Disassembly is mainly conducted manually with low productivity. Recently, human-robot collaboration is considered as a solution. For robots to assist effectively, they should observe work environments and recognize human actions accurately. Rich activity video recording and supervised learning can be used to extract insights; however, supervised learning does not allow robots to self-accomplish the learning process. This study proposes an unsupervised learning framework for achieving video-based human activity recognition. The framework consists of two main elements: a variational autoencoder-based architecture for unlabeled data representation learning, and a hidden Markov model for activity state division. The complete explicit activity classification is validated against ground truth labels; here, we use a case study of disassembling a hard disk drive. The framework shows an average recognition accuracy of 91.52% , higher than competing methods.  more » « less
Award ID(s):
2026276
PAR ID:
10465133
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE Transactions on Industrial Informatics
ISSN:
1551-3203
Page Range / eLocation ID:
1 to 10
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Turkan, Yelda; Louis, Joseph; Leite, Fernanda; Ergan, Semiha (Ed.)
    Human activity recognition (HAR) using machine learning has shown tremendous promise in detecting construction workers’ activities. HAR has many applications in human-robot interaction research to enable robots’ understanding of human counterparts’ activities. However, many existing HAR approaches lack robustness, generalizability, and adaptability. This paper proposes a transfer learning methodology for activity recognition of construction workers that requires orders of magnitude less data and compute time for comparable or better classification accuracy. The developed algorithm transfers features from a model pre-trained by the original authors and fine-tunes them for the downstream task of activity recognition in construction. The model was pre-trained on Kinetics-400, a large-scale video-based human activity recognition dataset with 400 distinct classes. The model was fine-tuned and tested using videos captured from manual material handling (MMH) activities found on YouTube. Results indicate that the fine-tuned model can recognize distinct MMH tasks in a robust and adaptive manner which is crucial for the widespread deployment of collaborative robots in construction. 
    more » « less
  2. Abstract Human–robot collaboration (HRC) has become an integral element of many manufacturing and service industries. A fundamental requirement for safe HRC is understanding and predicting human trajectories and intentions, especially when humans and robots operate nearby. Although existing research emphasizes predicting human motions or intentions, a key challenge is predicting both human trajectories and intentions simultaneously. This paper addresses this gap by developing a multi-task learning framework consisting of a bi-long short-term memory-based encoder–decoder architecture that obtains the motion data from both human and robot trajectories as inputs and performs two main tasks simultaneously: human trajectory prediction and human intention prediction. The first task predicts human trajectories by reconstructing the motion sequences, while the second task tests two main approaches for intention prediction: supervised learning, specifically a support vector machine, to predict human intention based on the latent representation, and, an unsupervised learning method, the hidden Markov model, that decodes the latent features for human intention prediction. Four encoder designs are evaluated for feature extraction, including interaction-attention, interaction-pooling, interaction-seq2seq, and seq2seq. The framework is validated through a case study of a desktop disassembly task with robots operating at different speeds. The results include evaluating different encoder designs, analyzing the impact of incorporating robot motion into the encoder, and detailed visualizations. The findings show that the proposed framework can accurately predict human trajectories and intentions. 
    more » « less
  3. Abstract Human-robot collaboration (HRC) has become an integral element of many industries, including manufacturing. A fundamental requirement for safe HRC is to understand and predict human intentions and trajectories, especially when humans and robots operate in close proximity. However, predicting both human intention and trajectory components simultaneously remains a research gap. In this paper, we have developed a multi-task learning (MTL) framework designed for HRC, which processes motion data from both human and robot trajectories. The first task predicts human trajectories, focusing on reconstructing the motion sequences. The second task employs supervised learning, specifically a Support Vector Machine (SVM), to predict human intention based on the latent representation. In addition, an unsupervised learning method, Hidden Markov Model (HMM), is utilized for human intention prediction that offers a different approach to decoding the latent features. The proposed framework uses MTL to understand human behavior in complex manufacturing environments. The novelty of the work includes the use of a latent representation to capture temporal dynamics in human motion sequences and a comparative analysis of various encoder architectures. We validate our framework through a case study focused on a HRC disassembly desktop task. The findings confirm the system’s capability to accurately predict both human intentions and trajectories. 
    more » « less
  4. Human-robot collaboration (HRC) has become an integral element of many industries, including manufacturing. A fundamental requirement for safe HRC is to understand and predict human intentions and trajectories, especially when humans and robots operate in close proximity. However, predicting both human intention and trajectory components simultaneously remains a research gap. In this paper, we have developed a multi-task learning (MTL) framework designed for HRC, which processes motion data from both human and robot trajectories. The first task predicts human trajectories, focusing on reconstructing the motion sequences. The second task employs supervised learning, specifically a Support Vector Machine (SVM), to predict human intention based on the latent representation. In addition, an unsupervised learning method, Hidden Markov Model (HMM), is utilized for human intention prediction that offers a different approach to decoding the latent features. The proposed framework uses MTL to understand human behavior in complex manufacturing environments. The novelty of the work includes the use of a latent representation to capture temporal dynamics in human motion sequences and a comparative analysis of various encoder architectures. We validate our framework through a case study focused on a HRC disassembly desktop task. The findings confirm the system's capability to accurately predict both human intentions and trajectories. 
    more » « less
  5. Understanding human behavior and activity facilitates advancement of numerous real-world applications, and is critical for video analysis. Despite the progress of action recognition algorithms in trimmed videos, the majority of real-world videos are lengthy and untrimmed with sparse segments of interest. The task of temporal activity detection in untrimmed videos aims to localize the temporal boundary of actions and classify the action categories. Temporal activity detection task has been investigated in full and limited supervision settings depending on the availability of action annotations. This paper provides an extensive overview of deep learning-based algorithms to tackle temporal action detection in untrimmed videos with different supervision levels including fully-supervised, weakly-supervised, unsupervised, self-supervised, and semi-supervised. In addition, this paper reviews advances in spatio-temporal action detection where actions are localized in both temporal and spatial dimensions. Action detection in online setting is also reviewed where the goal is to detect actions in each frame without considering any future context in a live video stream. Moreover, the commonly used action detection benchmark datasets and evaluation metrics are described, and the performance of the state-of-the-art methods are compared. Finally, real-world applications of temporal action detection in untrimmed videos and a set of future directions are discussed. 
    more » « less