skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Robust Activity Recognition for Adaptive Worker-Robot Interaction using Transfer Learning
Human activity recognition (HAR) using machine learning has shown tremendous promise in detecting construction workers’ activities. HAR has many applications in human-robot interaction research to enable robots’ understanding of human counterparts’ activities. However, many existing HAR approaches lack robustness, generalizability, and adaptability. This paper proposes a transfer learning methodology for activity recognition of construction workers that requires orders of magnitude less data and compute time for comparable or better classification accuracy. The developed algorithm transfers features from a model pre-trained by the original authors and fine-tunes them for the downstream task of activity recognition in construction. The model was pre-trained on Kinetics-400, a large-scale video-based human activity recognition dataset with 400 distinct classes. The model was fine-tuned and tested using videos captured from manual material handling (MMH) activities found on YouTube. Results indicate that the fine-tuned model can recognize distinct MMH tasks in a robust and adaptive manner which is crucial for the widespread deployment of collaborative robots in construction.  more » « less
Award ID(s):
2047138
PAR ID:
10447842
Author(s) / Creator(s):
; ;
Editor(s):
Turkan, Yelda; Louis, Joseph; Leite, Fernanda; Ergan, Semiha
Date Published:
Journal Name:
2023 ASCE International Conference on Computing in Civil Engineering
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In Activities of Daily Living (ADL) research, which has gained prominence due to the burgeoning aging population, the challenge of acquiring sufficient ground truth data for model training is a significant bottleneck. This obstacle necessitates a pivot towards unsupervised representation learning methodologies, which do not require many labeled datasets. The existing research focused on the tradeoff between the fully supervised model and the unsupervised pre-trained model and found that the unsupervised version outperformed in most cases. However, their investigation did not use large enough Human Activity Recognition (HAR) datasets, both datasets resulting in 3 dimensions. This poster extends the investigation by employing a large multivariate time series HAR dataset and experimenting with the models with different combinations of critical training parameters such as batch size and learning rate to observe the performance tradeoff. Our findings reveal that the pre-trained model is comparable to the fully supervised classification with a larger multivariate time series HAR dataset. This discovery underscores the potential of unsupervised representation learning in ADL extractions and highlights the importance of model configuration in optimizing performance. 
    more » « less
  2. In safety-critical environments, robots need to reliably recognize human activity to be effective and trust-worthy partners. Since most human activity recognition (HAR) approaches rely on unimodal sensor data (e.g. motion capture or wearable sensors), it is unclear how the relationship between the sensor modality and motion granularity (e.g. gross or fine) of the activities impacts classification accuracy. To our knowledge, we are the first to investigate the efficacy of using motion capture as compared to wearable sensor data for recognizing human motion in manufacturing settings. We introduce the UCSD-MIT Human Motion dataset, composed of two assembly tasks that entail either gross or fine-grained motion. For both tasks, we compared the accuracy of a Vicon motion capture system to a Myo armband using three widely used HAR algorithms. We found that motion capture yielded higher accuracy than the wearable sensor for gross motion recognition (up to 36.95%), while the wearable sensor yielded higher accuracy for fine-grained motion (up to 28.06%). These results suggest that these sensor modalities are complementary, and that robots may benefit from systems that utilize multiple modalities to simultaneously, but independently, detect gross and fine-grained motion. Our findings will help guide researchers in numerous fields of robotics including learning from demonstration and grasping to effectively choose sensor modalities that are most suitable for their applications. 
    more » « less
  3. Audio-based human activity recognition (HAR) is very popular because many human activities have unique sound signatures that can be detected using machine learning (ML) approaches. These audio-based ML HAR pipelines often use common featurization techniques, such as extracting various statistical and spectral features by converting time domain signals to the frequency domain (using an FFT) and using them to train ML models. Some of these approaches also claim privacy benefits by preventing the identification of human speech. However, recent deep learning-based automatic speech recognition (ASR) models pose new privacy challenges to these featurization techniques. In this paper, we systematically evaluate various featurization approaches for audio data, assessing their privacy risks through metrics like speech intelligibility (PER and WER) while considering the utility tradeoff in terms of ML-based activity recognition accuracy. Our findings reveal the susceptibility of these approaches to speech content recovery when exposed to recent ASR models, especially under re-tuning or retraining conditions. Notably, fine-tuned ASR models achieved an average Phoneme Error Rate (PER) of 39.99% and Word Error Rate (WER) of 44.43% in speech recognition for these approaches. To overcome these privacy concerns, we propose Kirigami, a lightweight machine learning-based audio speech filter that removes human speech segments reducing the efficacy of ASR models (70.48% PER and 101.40% WER) while also maintaining HAR accuracy (76.0% accuracy). We show that Kirigami can be implemented on common edge microcontrollers with limited computational capabilities and memory, providing a path to deployment on a variety of IoT devices. Finally, we conducted a real-world user study and showed the robustness of Kirigami on a laptop and an ARM Cortex-M4F microcontroller under three different background noises. 
    more » « less
  4. Hedden, Abigail S; Mazzaro, Gregory J (Ed.)
    Human activity recognition (HAR) with radar-based technologies has become a popular research area in the past decade. However, the objective of these studies are often to classify human activity for anyone; thus, models are trained using data spanning as broad a swath of people and mobility profiles as possible. In contrast, applications of HAR and gait analysis to remote health monitoring require characterization of the person-specific qualities of a person’s activities and gait, which greatly depends on age, health and agility. In fact, the speed or agility with which a person moves can be an important health indicator. In this study, we propose a multi-input multi-task deep learning framework to simultaneously learn a person’s activity and agility. In this initial study, we consider three different agility states: slow, nominal, and fast. It is shown that joint learning of agility and activity improves the classification accuracy for both activity and agility recognition tasks. To the best of our knowledge, this study is the first work considering both agility characterization and personalized activity recognition using RF sensing. 
    more » « less
  5. Driven by the development of machine learning and the development of wireless techniques, lots of research efforts have been spent on the human activity recognition (HAR). Although various deep learning algorithms can achieve high accuracy for recognizing human activities, existing works lack of a theoretical performance upper bound which is the best accuracy that is only limited by the influencing factors in wireless networks such as indoor physical environments and settings of wireless sensing devices regardless of any HAR algorithm. Without the understanding of performance upper bound, mistakenly configuring the influencing factors can reduce the HAR accuracy drastically no matter what deep learning algorithms are utilized. In this paper, we propose the HAR performance upper bound which is the minimum classification error probability that doesn't depend on any HAR algorithms and can be considered as a function of influencing factors in wireless sensing networks for CSI based human activity recognition. Since the performance upper bound can capture the impacts of influencing factors on HAR accuracy, we further analyze the influences of those factors with varying situations such as through the wall HAR and different human activities by MATLAB simulations. 
    more » « less