Abstract Additive manufacturing enables the fabrication of complex designs while minimizing waste, but faces challenges related to defects and process anomalies. This study presents a novel multimodal Retrieval-Augmented Generation-based framework that automates anomaly detection across various Additive Manufacturing processes leveraging retrieved information from literature, including images and descriptive text, rather than training datasets. This framework integrates text and image retrieval from scientific literature and multimodal generation models to perform zero-shot anomaly identification, classification, and explanation generation in a Laser Powder Bed Fusion setting. The proposed framework is evaluated on four L-PBF manufacturing datasets from Oak Ridge National Laboratory, featuring various printer makes, models, and materials. This evaluation demonstrates the framework’s adaptability and generalizability across diverse images without requiring additional training. Comparative analysis using Qwen2-VL-2B and GPT-4o-mini as MLLM within the proposed framework highlights that GPT-4o-mini outperforms Qwen2-VL-2B and proportional random baseline in manufacturing anomalies classification. Additionally, the evaluation of the RAG system confirms that incorporating retrieval mechanisms improves average accuracy by 12% by reducing the risk of hallucination and providing additional information. The proposed framework can be continuously updated by integrating emerging research, allowing seamless adaptation to the evolving landscape of AM technologies. This scalable, automated, and zero-shot-capable framework streamlines AM anomaly analysis, enhancing efficiency and accuracy.
more »
« less
This content will become publicly available on December 4, 2026
Zero-Shot Anomaly Detection in Laser Powder Bed Fusion Using Multimodal RAG and Large Language Models
Additive manufacturing enables the fabrication of complex designs while minimizing waste, but faces challenges related to defects and process anomalies. This study presents a novel multimodal Retrieval-Augmented Generation-based framework that automates anomaly detection across various Additive Manufacturing processes leveraging retrieved information from literature, including images and descriptive text, rather than training datasets. This framework integrates text and image retrieval from scientific literature and multimodal generation models to perform zero-shot anomaly identification, classification, and explanation generation in a Laser Powder Bed Fusion setting. The proposed framework is evaluated on four L-PBF manufacturing datasets from Oak Ridge National Laboratory, featuring various printer makes, models, and materials. This evaluation demonstrates the framework's adaptability and generalizability across diverse images without requiring additional training. Comparative analysis using Qwen2-VL-2B and GPT-4o-mini as MLLM within the proposed framework highlights that GPT-4o-mini outperforms Qwen2-VL-2B and proportional random baseline in manufacturing anomalies classification. Additionally, the evaluation of the RAG system confirms that incorporating retrieval mechanisms improves average accuracy by 12% by reducing the risk of hallucination and providing additional information. The proposed framework can be continuously updated by integrating emerging research, allowing seamless adaptation to the evolving landscape of AM technologies. This scalable, automated, and zero-shot-capable framework streamlines AM anomaly analysis, enhancing efficiency and accuracy.
more »
« less
- Award ID(s):
- 2102406
- PAR ID:
- 10654593
- Publisher / Repository:
- The American Society of Mechanical Engineers
- Date Published:
- Journal Name:
- Journal of Mechanical Design
- ISSN:
- 1050-0472
- Page Range / eLocation ID:
- 1 to 16
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The burgeoning sophistication of Artificial Intelligence (AI) has catalyzed the rapid proliferation of Large Language Models (LLMs) within software development. These models are increasingly employed to automate the generation of functionally correct code, address complex computational problems, and facilitate the debugging of existing software systems. However, LLM-generated code often faces challenges due to inherent inefficiencies, including redundant logical structures, factually inconsistent content (hallucinations), and programming errors. To address this issue, our research rigorously evaluated the computational efficiency of Python code generated by three prominent LLMs: GPT-4o-Mini, GPT-3.5-Turbo, and GPT-4-Turbo. The evaluation metrics encompass execution time, memory utilization, and peak memory consumption, while maintaining the functional correctness of the generated code. Leveraging the EffiBench benchmark datasets within the Google Vertex AI Workbench environment, across a spectrum of machine configurations, the study implemented a consistent seed parameter to ensure experimental reproducibility. Furthermore, we investigated the impact of two distinct optimization strategies: Chain-of-Thought (CoT) prompting and model fine-tuning. Our findings reveal a significant enhancement in efficiency metrics for GPT-4o-Mini and GPT-3.5-Turbo when employing CoT prompting; however, this trend was not observed for GPT-4-Turbo. Based on its promising performance with CoT prompting, we selected the GPT-4o-Mini model for subsequent fine-tuning, aiming to further enhance both its computational efficiency and accuracy. However, contrary to our expectations, fine-tuning the GPT-4o-Mini model led to a discernible degradation in both its accuracy and computational efficiency. In conclusion, this study provides empirical evidence suggesting that the deployment of high-CPU machine configurations, in synergy with the utilization of the GPT-4o-Mini model and CoT prompting techniques, yields demonstrably more efficient and accurate LLM-generated Python code, particularly within computationally intensive application scenarios.more » « less
-
We investigate knowledge retrieval with multi-modal queries, i.e. queries containing information split across image and text inputs, a challenging task that differs from previous work on cross-modal retrieval. We curate a new dataset called ReMuQ for benchmarking progress on this task. ReMuQ requires a system to retrieve knowledge from a large corpus by integrating contents from both text and image queries. We introduce a retriever model “ReViz” that can directly process input text and images to retrieve relevant knowledge in an end-to-end fashion without being dependent on intermediate modules such as object detectors or caption generators. We introduce a new pretraining task that is effective for learning knowledge retrieval with multimodal queries and also improves performance on downstream tasks. We demonstrate superior performance in retrieval on two datasets (ReMuQ and OK-VQA) under zero-shot settings as well as further improvements when finetuned on these datasets.more » « less
-
Wang, Yan; Yang, Hui (Ed.)Abstract The scarcity of measured data for defect identification often challenges the development and certification of additive manufacturing processes. Knowledge transfer and sharing have become emerging solutions to small-data challenges in quality control to improve machine learning with limited data, but this strategy raises concerns regarding privacy protection. Existing zero-shot learning and federated learning methods are insufficient to represent, select, and mask data to share and control privacy loss quantification. This study integrates differential privacy in cybersecurity with federated learning to investigate sharing strategies of manufacturing defect ontology. The method first proposes using multilevel attributes masked by noise in defect ontology as the sharing data structure to characterize manufacturing defects. Information leaks due to the sharing of ontology branches and data are estimated by epsilon differential privacy (DP). Under federated learning, the proposed method optimizes sharing defect ontology and image data strategies to improve zero-shot defect classification given privacy budget limits. The proposed framework includes (1) developing a sharing strategy based on multilevel attributes in defect ontology with controllable privacy leaks, (2) optimizing joint decisions in differential privacy, zero-shot defect classification, and federated learning, and (3) developing a two-stage algorithm to solve the joint optimization, combining stochastic gradient descent search for classification models and an evolutionary algorithm for exploring data-sharing strategies. A case study on zero-shot learning of additive manufacturing defects demonstrated the effectiveness of the proposed method in data-sharing strategies, such as ontology sharing, defect classification, and cloud information use.more » « less
-
Vision-language (VL) pre-training has recently received considerable attention. However, most existing end-to-end pre-training approaches either only aim to tackle VL tasks such as image-text retrieval, visual question answering (VQA) and image captioning that test high-level understanding of images, or only target region-level understanding for tasks such as phrase grounding and object detection. We present FIBER (Fusion-In-the-Backbone-based transformER), a new VL model architecture that can seamlessly handle both these types of tasks. Instead of having dedicated transformer layers for fusion after the uni-modal backbones, FIBER pushes multimodal fusion deep into the model by inserting cross-attention into the image and text backbones to better capture multimodal interactions. In addition, unlike previous work that is either only pre-trained on image-text data or on fine-grained data with box-level annotations, we present a two-stage pre-training strategy that uses both these kinds of data efficiently: (i) coarse-grained pre-training based on image-text data; followed by (ii) fine-grained pre-training based on image-text-box data. We conduct comprehensive experiments on a wide range of VL tasks, ranging from VQA, image captioning, and retrieval, to phrase grounding, referring expression comprehension, and object detection. Using deep multimodal fusion coupled with the two-stage pre-training, FIBER provides consistent performance improvements over strong baselines across all tasks, often outperforming methods using magnitudes more data. Code is released at https://github.com/microsoft/FIBER.more » « less
An official website of the United States government
