Abstract Human-robot collaboration (HRC) has become an integral element of many industries, including manufacturing. A fundamental requirement for safe HRC is to understand and predict human intentions and trajectories, especially when humans and robots operate in close proximity. However, predicting both human intention and trajectory components simultaneously remains a research gap. In this paper, we have developed a multi-task learning (MTL) framework designed for HRC, which processes motion data from both human and robot trajectories. The first task predicts human trajectories, focusing on reconstructing the motion sequences. The second task employs supervised learning, specifically a Support Vector Machine (SVM), to predict human intention based on the latent representation. In addition, an unsupervised learning method, Hidden Markov Model (HMM), is utilized for human intention prediction that offers a different approach to decoding the latent features. The proposed framework uses MTL to understand human behavior in complex manufacturing environments. The novelty of the work includes the use of a latent representation to capture temporal dynamics in human motion sequences and a comparative analysis of various encoder architectures. We validate our framework through a case study focused on a HRC disassembly desktop task. The findings confirm the system’s capability to accurately predict both human intentions and trajectories. 
                        more » 
                        « less   
                    This content will become publicly available on May 1, 2026
                            
                            Multi-Task Learning for Intention and Trajectory Prediction in Human–Robot Collaborative Disassembly Tasks
                        
                    
    
            Abstract Human–robot collaboration (HRC) has become an integral element of many manufacturing and service industries. A fundamental requirement for safe HRC is understanding and predicting human trajectories and intentions, especially when humans and robots operate nearby. Although existing research emphasizes predicting human motions or intentions, a key challenge is predicting both human trajectories and intentions simultaneously. This paper addresses this gap by developing a multi-task learning framework consisting of a bi-long short-term memory-based encoder–decoder architecture that obtains the motion data from both human and robot trajectories as inputs and performs two main tasks simultaneously: human trajectory prediction and human intention prediction. The first task predicts human trajectories by reconstructing the motion sequences, while the second task tests two main approaches for intention prediction: supervised learning, specifically a support vector machine, to predict human intention based on the latent representation, and, an unsupervised learning method, the hidden Markov model, that decodes the latent features for human intention prediction. Four encoder designs are evaluated for feature extraction, including interaction-attention, interaction-pooling, interaction-seq2seq, and seq2seq. The framework is validated through a case study of a desktop disassembly task with robots operating at different speeds. The results include evaluating different encoder designs, analyzing the impact of incorporating robot motion into the encoder, and detailed visualizations. The findings show that the proposed framework can accurately predict human trajectories and intentions. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2422826
- PAR ID:
- 10616831
- Publisher / Repository:
- J. Comput. Inf. Sci. Eng
- Date Published:
- Journal Name:
- Journal of Computing and Information Science in Engineering
- Volume:
- 25
- Issue:
- 5
- ISSN:
- 1530-9827
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Human-robot collaboration (HRC) has become an integral element of many industries, including manufacturing. A fundamental requirement for safe HRC is to understand and predict human intentions and trajectories, especially when humans and robots operate in close proximity. However, predicting both human intention and trajectory components simultaneously remains a research gap. In this paper, we have developed a multi-task learning (MTL) framework designed for HRC, which processes motion data from both human and robot trajectories. The first task predicts human trajectories, focusing on reconstructing the motion sequences. The second task employs supervised learning, specifically a Support Vector Machine (SVM), to predict human intention based on the latent representation. In addition, an unsupervised learning method, Hidden Markov Model (HMM), is utilized for human intention prediction that offers a different approach to decoding the latent features. The proposed framework uses MTL to understand human behavior in complex manufacturing environments. The novelty of the work includes the use of a latent representation to capture temporal dynamics in human motion sequences and a comparative analysis of various encoder architectures. We validate our framework through a case study focused on a HRC disassembly desktop task. The findings confirm the system's capability to accurately predict both human intentions and trajectories.more » « less
- 
            Abstract Human intention prediction plays a critical role in human–robot collaboration, as it helps robots improve efficiency and safety by accurately anticipating human intentions and proactively assisting with tasks. While current applications often focus on predicting intent once human action is completed, recognizing human intent in advance has received less attention. This study aims to equip robots with the capability to forecast human intent before completing an action, i.e., early intent prediction. To achieve this objective, we first extract features from human motion trajectories by analyzing changes in human joint distances. These features are then utilized in a Hidden Markov Model (HMM) to determine the state transition times from uncertain intent to certain intent. Second, we propose two models including a Transformer and a Bi-LSTM for classifying motion intentions. Then, we design a human–robot collaboration experiment in which the operator reaches multiple targets while the robot moves continuously following a predetermined path. The data collected through the experiment were divided into two groups: full-length data and partial data before state transitions detected by the HMM. Finally, the effectiveness of the suggested framework for predicting intentions is assessed using two different datasets, particularly in a scenario when motion trajectories are similar but underlying intentions vary. The results indicate that using partial data prior to the motion completion yields better accuracy compared to using full-length data. Specifically, the transformer model exhibits a 2% improvement in accuracy, while the Bi-LSTM model demonstrates a 6% increase in accuracy.more » « less
- 
            More accurately inferring human intentions/goals can help robots complete collaborative human-robot tasks more safely and efficiently. Bayesian reasoning has become a popular approach for predicting the intention or goal of a partial sequence of actions/controls using a trajectory likelihood model. However, the mismatch between the training objective for these models (maximizing trajectory likelihood) and the application objective (maximizing intention likelihood) can be detrimental. In this paper, we seek to improve the goal prediction of maximum entropy inverse reinforcement learning (MaxEnt IRL) models by training to maximize goal likelihood. We demonstrate the benefits of our method on pointing task goal prediction with multiple possible goals and predicting goal based activities in the Cornell Activity Dataset (CAD-120).more » « less
- 
            This work studies the problem of predicting human intent to interact with a robot in a public environment. To facilitate research in this problem domain, we first contribute the People Approaching Robots Database (PAR-D), a new collection of datasets for intent prediction in Human-Robot Interaction. The database includes a subset of the ATC Approach Trajectory dataset [28] with augmented ground truth labels. It also includes two new datasets collected with a robot photographer on two locations of a university campus. Then, we contribute a novel human-annotated baseline for predicting intent. Our results suggest that the robot’s environment and the amount of time that a person is visible impacts human performance in this prediction task. We also provide computational baselines for intent prediction in PAR-D by comparing the performance of several machine learning models, including ones that directly model pedestrian interaction intent and others that predict motion trajectories as an intermediary step. From these models, we find that trajectory prediction seems useful for inferring intent to interact with a robot in a public environment.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
