skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 1830295

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Humans often use natural language instructions to control and interact with robots for task execution. This poses a big challenge to robots that need to not only parse and understand human instructions but also realise semantic understanding of an unknown environment and its constituent elements. To address this challenge, this study presents a vision-language model (VLM)-driven approach to scene understanding of an unknown environment to enable robotic object manipulation. Given language instructions, a pretrained vision-language model built on open-sourced Llama2-chat (7B) as the language model backbone is adopted for image description and scene understanding, which translates visual information into text descriptions of the scene. Next, a zero-shot-based approach to fine-grained visual grounding and object detection is developed to extract and localise objects of interest from the scene task. Upon 3D reconstruction and pose estimate establishment of the object, a code-writing large language model (LLM) is adopted to generate high-level control codes and link language instructions with robot actions for downstream tasks. The performance of the developed approach is experimentally validated through table-top object manipulation by a robot. 
    more » « less
    Free, publicly-accessible full text available August 30, 2025
  2. Autonomous robots that understand human instructions can significantly enhance the efficiency in human-robot assembly operations where robotic support is needed to handle unknown objects and/or provide on-demand assistance. This paper introduces a vision AI-based method for human-robot collaborative (HRC) assembly, enabled by a large language model (LLM). Upon 3D object reconstruction and pose establishment through neural object field modelling, a visual servoing-based mobile robotic system performs object manipulation and navigation guidance to a mobile robot. The LLM model provides text-based logic reasoning and high-level control command generation for natural human-robot interactions. The effectiveness of the presented method is experimentally demonstrated. 
    more » « less
  3. Human-Robot Collaboration (HRC) aims to create environments where robots can understand workspace dynamics and actively assist humans in operations, with the human intention recognition being fundamental to efficient and safe task fulfillment. Language-based control and communication is a natural and convenient way to convey human intentions. However, traditional language models require instructions to be articulated following a rigid, predefined syntax, which can be unnatural, inefficient, and prone to errors. This paper investigates the reasoning abilities that emerged from the recent advancement of Large Language Models (LLMs) to overcome these limitations, allowing for human instructions to be used to enhance human-robot communication. For this purpose, a generic GPT 3.5 model has been fine-tuned to interpret and translate varied human instructions into essential attributes, such as task relevancy and tools and/or parts required for the task. These attributes are then fused with perceived on-going robot action to generate a sequence of relevant actions. The developed technique is evaluated in a case study where robots initially misinterpreted human actions and picked up wrong tools and parts for assembly. It is shown that the fine-tuned LLM can effectively identify corrective actions across a diverse range of instructional human inputs, thereby enhancing the robustness of human-robot collaborative assembly for smart manufacturing. 
    more » « less
  4. Hideki Aoyama; Keiich Shirase (Ed.)
    An integral part of information-centric smart manufacturing is the adaptation of industrial robots to complement human workers in a collaborative manner. While advancement in sensing has enabled real-time monitoring of workspace, understanding the semantic information in the workspace, such as parts and tools, remains a challenge for seamless robot integration. The resulting lack of adaptivity to perform in a dynamic workspace have limited robots to tasks with pre-defined actions. In this paper, a machine learning-based robotic object detection and grasping method is developed to improve the adaptivity of robots. Specifically, object detection based on the concept of single-shot detection (SSD) and convolutional neural network (CNN) is investigated to recognize and localize objects in the workspace. Subsequently, the extracted information from object detection, such as the type, position, and orientation of the object, is fed into a multi-layer perceptron (MLP) to generate the desired joint angles of robotic arm for proper object grasping and handover to the human worker. Network training is guided by forward kinematics of the robotic arm in a self-supervised manner to mitigate issues such as singularity in computation. The effectiveness of the developed method is validated on an eDo robotic arm in a human-robot collaborative assembly case study. 
    more » « less
  5. Abstract Today’s manufacturing systems are becoming increasingly complex, dynamic, and connected. The factory operations face challenges of highly nonlinear and stochastic activity due to the countless uncertainties and interdependencies that exist. Recent developments in artificial intelligence (AI), especially Machine Learning (ML) have shown great potential to transform the manufacturing domain through advanced analytics tools for processing the vast amounts of manufacturing data generated, known as Big Data. The focus of this paper is threefold: (1) review the state-of-the-art applications of AI to representative manufacturing problems, (2) provide a systematic view for analyzing data and process dependencies at multiple levels that AI must comprehend, and (3) identify challenges and opportunities to not only further leverage AI for manufacturing, but also influence the future development of AI to better meet the needs of manufacturing. To satisfy these objectives, the paper adopts the hierarchical organization widely practiced in manufacturing plants in examining the interdependencies from the overall system level to the more detailed granular level of incoming material process streams. In doing so, the paper considers a wide range of topics from throughput and quality, supervisory control in human–robotic collaboration, process monitoring, diagnosis, and prognosis, finally to advances in materials engineering to achieve desired material property in process modeling and control. 
    more » « less
  6. Human-Robot Collaboration (HRC), which envisions a workspace in which human and robot can dynamically collaborate, has been identified as a key element in smart manufacturing. Human action recognition plays a key role in the realization of HRC as it helps identify current human action and provides the basis for future action prediction and robot planning. Despite recent development of Deep Learning (DL) that has demonstrated great potential in advancing human action recognition, one of the key issues remains as how to effectively leverage the temporal information of human motion to improve the performance of action recognition. Furthermore, large volume of training data is often difficult to obtain due to manufacturing constraints, which poses challenge for the optimization of DL models. This paper presents an integrated method based on optical flow and convolutional neural network (CNN)-based transfer learning to tackle these two issues. First, optical flow images, which encode the temporal information of human motion, are extracted and serve as the input to a two-stream CNN structure for simultaneous parsing of spatial-temporal information of human motion. Then, transfer learning is investigated to transfer the feature extraction capability of a pretrained CNN to manufacturing scenarios. Evaluation using engine block assembly confirmed the effectiveness of the developed method. 
    more » « less