Search for: All records

Creators/Authors contains: "Zhang, Xinyao"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Predictive Repair Management Using Multi-Head Attention Transformer and Online Learning

Zhang, Xinyao; Cade, Willie; Haapala, Karl R; Natarajan, Arun; Behdad, Sara (August 2025, Proceedings of the ASME 2025 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference IDETC/CIE2025 August 17-20, 2025, Anaheim, California)

Accurate prediction of repair durations is a challenge in product maintenance due to its implications for resource allocation, customer satisfaction, and operational performance. This study aims to develop a deep learning framework to help fleet repair shops accurately categorize repair time given product historical data. The study uses an automobile repair and maintenance dataset and creates an end-to-end predictive framework by employing a multi-head attention network designed for tabular data. The developed framework combines categorical information, transformed through embeddings and attention mechanisms, with numerical historical data to facilitate integration and learning from diverse data features. A weighted loss function is introduced to overcome class imbalance issues in large datasets. Moreover, an online learning strategy is used for continuous incremental model updates to maintain predictive accuracy in evolving operational environments. Our empirical findings demonstrate that the multi-head attention mechanism extracts meaningful interactions between vehicle identifiers and repair types compared to a feed-forward neural network. Also, combining historical maintenance data with an online learning strategy facilitates real-time adjustments to changing patterns and increases the model’s predictive performance on new data. The model is tested on real-world repair data spanning 2013 to 2020 and achieves an accuracy of 78%, with attention weight analyses illustrating feature interactions.
more » « less
Free, publicly-accessible full text available August 20, 2026
Multi-Task Learning for Intention and Trajectory Prediction in Human–Robot Collaborative Disassembly Tasks

https://doi.org/10.1115/1.4067157

Zhang, Xinyao; Tian, Sibo; Liang, Xiao; Zheng, Minghui; Behdad, Sara (May 2025, Journal of Computing and Information Science in Engineering)

Abstract Human–robot collaboration (HRC) has become an integral element of many manufacturing and service industries. A fundamental requirement for safe HRC is understanding and predicting human trajectories and intentions, especially when humans and robots operate nearby. Although existing research emphasizes predicting human motions or intentions, a key challenge is predicting both human trajectories and intentions simultaneously. This paper addresses this gap by developing a multi-task learning framework consisting of a bi-long short-term memory-based encoder–decoder architecture that obtains the motion data from both human and robot trajectories as inputs and performs two main tasks simultaneously: human trajectory prediction and human intention prediction. The first task predicts human trajectories by reconstructing the motion sequences, while the second task tests two main approaches for intention prediction: supervised learning, specifically a support vector machine, to predict human intention based on the latent representation, and, an unsupervised learning method, the hidden Markov model, that decodes the latent features for human intention prediction. Four encoder designs are evaluated for feature extraction, including interaction-attention, interaction-pooling, interaction-seq2seq, and seq2seq. The framework is validated through a case study of a desktop disassembly task with robots operating at different speeds. The results include evaluating different encoder designs, analyzing the impact of incorporating robot motion into the encoder, and detailed visualizations. The findings show that the proposed framework can accurately predict human trajectories and intentions.
more » « less
Free, publicly-accessible full text available May 1, 2026
Multi-Task Learning for Intention and Trajectory Prediction in Human-Robot Collaborative Disassembly Tasks

Zhang, Xinyao; Tian, Sibo; Liang, Xiao; Zheng, Minghui; Behdad, Sara (August 2024, Proceedings of the ASME 2024 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, IDETC/CIE2024, August 25–28, 2024, Washington, DC, USA.)

Human-robot collaboration (HRC) has become an integral element of many industries, including manufacturing. A fundamental requirement for safe HRC is to understand and predict human intentions and trajectories, especially when humans and robots operate in close proximity. However, predicting both human intention and trajectory components simultaneously remains a research gap. In this paper, we have developed a multi-task learning (MTL) framework designed for HRC, which processes motion data from both human and robot trajectories. The first task predicts human trajectories, focusing on reconstructing the motion sequences. The second task employs supervised learning, specifically a Support Vector Machine (SVM), to predict human intention based on the latent representation. In addition, an unsupervised learning method, Hidden Markov Model (HMM), is utilized for human intention prediction that offers a different approach to decoding the latent features. The proposed framework uses MTL to understand human behavior in complex manufacturing environments. The novelty of the work includes the use of a latent representation to capture temporal dynamics in human motion sequences and a comparative analysis of various encoder architectures. We validate our framework through a case study focused on a HRC disassembly desktop task. The findings confirm the system's capability to accurately predict both human intentions and trajectories.
more » « less
Full Text Available
Multi-Task Learning for Intention and Trajectory Prediction in Human-Robot Collaborative Disassembly Tasks

https://doi.org/10.1115/DETC2024-143753

Zhang, Xinyao; Tian, Sibo; Liang, Xiao; Zheng, Minghui; Behdad, Sara (August 2024, American Society of Mechanical Engineers)

Abstract Human-robot collaboration (HRC) has become an integral element of many industries, including manufacturing. A fundamental requirement for safe HRC is to understand and predict human intentions and trajectories, especially when humans and robots operate in close proximity. However, predicting both human intention and trajectory components simultaneously remains a research gap. In this paper, we have developed a multi-task learning (MTL) framework designed for HRC, which processes motion data from both human and robot trajectories. The first task predicts human trajectories, focusing on reconstructing the motion sequences. The second task employs supervised learning, specifically a Support Vector Machine (SVM), to predict human intention based on the latent representation. In addition, an unsupervised learning method, Hidden Markov Model (HMM), is utilized for human intention prediction that offers a different approach to decoding the latent features. The proposed framework uses MTL to understand human behavior in complex manufacturing environments. The novelty of the work includes the use of a latent representation to capture temporal dynamics in human motion sequences and a comparative analysis of various encoder architectures. We validate our framework through a case study focused on a HRC disassembly desktop task. The findings confirm the system’s capability to accurately predict both human intentions and trajectories.
more » « less
Full Text Available
Early Prediction of Human Intention for Human–Robot Collaboration Using Transformer Network

https://doi.org/10.1115/1.4064258

Zhang, Xinyao; Tian, Sibo; Liang, Xiao; Zheng, Minghui; Behdad, Sara (May 2024, Journal of Computing and Information Science in Engineering)

Abstract Human intention prediction plays a critical role in human–robot collaboration, as it helps robots improve efficiency and safety by accurately anticipating human intentions and proactively assisting with tasks. While current applications often focus on predicting intent once human action is completed, recognizing human intent in advance has received less attention. This study aims to equip robots with the capability to forecast human intent before completing an action, i.e., early intent prediction. To achieve this objective, we first extract features from human motion trajectories by analyzing changes in human joint distances. These features are then utilized in a Hidden Markov Model (HMM) to determine the state transition times from uncertain intent to certain intent. Second, we propose two models including a Transformer and a Bi-LSTM for classifying motion intentions. Then, we design a human–robot collaboration experiment in which the operator reaches multiple targets while the robot moves continuously following a predetermined path. The data collected through the experiment were divided into two groups: full-length data and partial data before state transitions detected by the HMM. Finally, the effectiveness of the suggested framework for predicting intentions is assessed using two different datasets, particularly in a scenario when motion trajectories are similar but underlying intentions vary. The results indicate that using partial data prior to the motion completion yields better accuracy compared to using full-length data. Specifically, the transformer model exhibits a 2% improvement in accuracy, while the Bi-LSTM model demonstrates a 6% increase in accuracy.
more » « less
Full Text Available
Early Prediction of Human Intention For Human-Robot Collaboration Using Transformer Network,

Zhang, Xinyao; Tian, Sibo; Liang, Xiao; Zheng, Minghui; Behdad, Sara (August 2023, Proceedings of the ASME 2023 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, IDETC/CIE2023)

Activity recognition is a crucial aspect in smart manufacturing and human-robot collaboration, as robots play a vital role in improving efficiency and safety by accurately recognizing human intentions and proactively assisting with tasks. Current human intention recognition applications only consider the accuracy of recognition but ignore the importance of predicting it in advance. Given human reaching movements, we want to equip the robot with the ability to predict human intent not only with precise recognition but also at an early stage. In this paper, we propose a framework to apply Transformer-based and LSTM-based models to learn motion intentions. Second, based on the observation of distances of human joints along the motion trajectory, we explore how we can use the hidden Markov model to find intent state transitions, i.e., intent uncertainty and intent certainty. Finally, two data types are generated, one for the full data and the other for the length of data before state transitions; both data are evaluated on models to assess the robustness of intention prediction. We conducted experiments in a manufacturing workspace where the experimenter reaches multiple scattered targets and further this experimental scenario was designed to examine how intents differ, but motions are only slightly different. The proposed models were then evaluated with experimental data, and further performance comparisons were made between models and between different intents. Finally, early predictions were validated to be better than using full-length data.
more » « less
Full Text Available
Unsupervised Human Activity Recognition Learning for Disassembly Tasks

https://doi.org/10.1109/TII.2023.3264284

Zhang, Xinyao; Yi, Daiyao; Behdad, Sara; Saxena, Shreya (April 2023, IEEE Transactions on Industrial Informatics)

Large volumes of used electronics are often collected in remanufacturing plants, which requires disassembly before harvesting parts for reuse. Disassembly is mainly conducted manually with low productivity. Recently, human-robot collaboration is considered as a solution. For robots to assist effectively, they should observe work environments and recognize human actions accurately. Rich activity video recording and supervised learning can be used to extract insights; however, supervised learning does not allow robots to self-accomplish the learning process. This study proposes an unsupervised learning framework for achieving video-based human activity recognition. The framework consists of two main elements: a variational autoencoder-based architecture for unlabeled data representation learning, and a hidden Markov model for activity state division. The complete explicit activity classification is validated against ground truth labels; here, we use a case study of disassembling a hard disk drive. The framework shows an average recognition accuracy of 91.52% , higher than competing methods.
more » « less
Full Text Available
Automatic Screw Detection and Tool Recommendation System for Robotic Disassembly

https://doi.org/10.1115/1.4056074

Zhang, Xinyao; Eltouny, Kareem; Liang, Xiao; Behdad, Sara (March 2023, Journal of Manufacturing Science and Engineering)

Abstract Disassembly is an essential process for the recovery of end-of-life (EOL) electronics in remanufacturing sites. Nevertheless, the process remains labor-intensive due to EOL electronics’ high degree of uncertainty and complexity. The robotic technology can assist in improving disassembly efficiency; however, the characteristics of EOL electronics pose difficulties for robot operation, such as removing small components. For such tasks, detecting small objects is critical for robotic disassembly systems. Screws are widely used as fasteners in ordinary electronic products while having small sizes and varying shapes in a scene. To enable robotic systems to disassemble screws, the location information and the required tools need to be predicted. This paper proposes a computer vision framework for detecting screws and recommending related tools for disassembly. First, a YOLOv4 algorithm is used to detect screw targets in EOL electronic devices and a screw image extraction mechanism is executed based on the position coordinates predicted by YOLOv4. Second, after obtaining the screw images, the EfficientNetv2 algorithm is applied for screw shape classification. In addition to proposing a framework for automatic small-object detection, we explore how to modify the object detection algorithm to improve its performance and discuss the sensitivity of tool recommendations to the detection predictions. A case study of three different types of screws in EOL electronics is used to evaluate the performance of the proposed framework.
more » « less
Full Text Available
A machine learning approach to identifying suicide risk among text-based crisis counseling encounters

https://doi.org/10.3389/fpsyt.2023.1110527

Broadbent, Meghan; Medina Grespan, Mattia; Axford, Katherine; Zhang, Xinyao; Srikumar, Vivek; Kious, Brent; Imel, Zac (March 2023, Frontiers in Psychiatry)

IntroductionWith the increasing utilization of text-based suicide crisis counseling, new means of identifying at risk clients must be explored. Natural language processing (NLP) holds promise for evaluating the content of crisis counseling; here we use a data-driven approach to evaluate NLP methods in identifying client suicide risk. MethodsDe-identified crisis counseling data from a regional text-based crisis encounter and mobile tipline application were used to evaluate two modeling approaches in classifying client suicide risk levels. A manual evaluation of model errors and system behavior was conducted. ResultsThe neural model outperformed a term frequency-inverse document frequency (tf-idf) model in the false-negative rate. While 75% of the neural model’s false negative encounters had some discussion of suicidality, 62.5% saw a resolution of the client’s initial concerns. Similarly, the neural model detected signals of suicidality in 60.6% of false-positive encounters. DiscussionThe neural model demonstrated greater sensitivity in the detection of client suicide risk. A manual assessment of errors and model performance reflected these same findings, detecting higher levels of risk in many of the false-positive encounters and lower levels of risk in many of the false negatives. NLP-based models can detect the suicide risk of text-based crisis encounters from the encounter’s content.
more » « less
Full Text Available
Logic-driven Indirect Supervision: An Application to Crisis Counseling

https://doi.org/10.18653/v1/2023.acl-long.654

Medina Grespan, Mattia; Broadbent, Meghan; Zhang, Xinyao; Axford, Katherine; Kious, Brent; Imel, Zac; Srikumar, Vivek (January 2023, Association for Computational Linguistics)

Ensuring the effectiveness of text-based crisis counseling requires observing ongoing conversations and providing feedback, both labor-intensive tasks. Automatic analysis of conversations—at the full chat and utterance levels—may help support counselors and provide better care. While some session-level training data (e.g., rating of patient risk) is often available from counselors, labeling utterances requires expensive post hoc annotation. But the latter can not only provide insights about conversation dynamics, but can also serve to support quality assurance efforts for counselors. In this paper, we examine if inexpensive—and potentially noisy—session-level annotation can help improve label utterances. To this end, we propose a logic-based indirect supervision approach that exploits declaratively stated structural dependencies between both levels of annotation to improve utterance modeling. We show that adding these rules gives an improvement of 3.5% f-score over a strong multi-task baseline for utterance-level predictions. We demonstrate via ablation studies how indirect supervision via logic rules also improves the consistency and robustness of the system.
more » « less
Full Text Available

« Prev Next »