skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on January 16, 2026

Title: RAGFix: Enhancing LLM Code Repair Using RAG and Stack Overflow Posts
Identifying, localizing, and resolving bugs in software engineering is challenging and costly. Approaches to resolve software bugs range from Large Language Model (LLM) code analysis and repair, and automated code repair technology that aims to alleviate the technical burden of difficult to solve bugs. We propose RAGFix, which enhances LLM’s capabilities for bug localization and code repair using Retrieval Augmented Generation (RAG) based on dynamically collected Stack Overflow posts. These posts are searchable via a Question and Answer Knowledge Graph (KGQA). We evaluate our method on the HumanEvalFix benchmark for Python using relevant closed and open-source models. Our approach facilitates error resolution in Python coding problems by creating a searchable, embedded knowledge graph representation of bug and solution information from Stack Overflow, interlinking bugs, and solutions through semi-supervised graph construction methods. We use cosine similarity on embeddings based on LLM-synthesized summaries and algorithmic features describing the coding problem and potential solution to find relevant results that improve LLM in-context performance. Our results indicate that our system enhances small open-source models’ ability to effectively repair code, particularly where these models have less parametric knowledge about relevant coding problems and can leverage nonparametric knowledge to provide accurate, actionable fixes.  more » « less
Award ID(s):
2349663
PAR ID:
10570231
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE
Date Published:
ISSN:
2573-2978
ISBN:
979-8-3503-6248-0
Format(s):
Medium: X
Location:
Washington, DC, USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Identifying, localizing, and resolving bugs in software engineering is challenging and costly. Approaches to resolve software bugs range from Large Language Model (LLM) code analysis and repair, and automated code repair technology that aims to alleviate the technical burden of difficult to solve bugs. We propose RAGFix, which enhances LLM’s capabilities for bug localization and code repair using Retrieval Augmented Generation (RAG) based on dynamically collected Stack Overflow posts. These posts are searchable via a Question and Answer Knowledge Graph (KGQA). We evaluate our method on the HumanEvalFix benchmark for Python using relevant closed and open-source models. Our approach facilitates error resolution in Python coding problems by creating a searchable, embedded knowledge graph representation of bug and solution information from Stack Overflow, interlinking bugs, and solutions through semi-supervised graph construction methods. We use cosine similarity on embeddings based on LLM-synthesized summaries and algorithmic features describing the coding problem and potential solution to find relevant results that improve LLM in-context performance. Our results indicate that our system enhances small open-source models’ ability to effectively repair code, particularly where these models have less parametric knowledge about relevant coding problems and can leverage nonparametric knowledge to provide accurate, actionable fixes. 
    more » « less
  2. null (Ed.)
    Stack Overflow is commonly used by software developers to help solve problems they face while working on software tasks such as fixing bugs or building new features. Recent research has explored how the content of Stack Overflow posts affects attraction and how the reputation of users attracts more visitors. However, there is very little evidence on the effect that visual attractors and content quantity have on directing gaze toward parts of a post, and which parts hold the attention of a user longer. Moreover, little is known about how these attractors help developers (students and professionals) answer comprehension questions. This paper presents an eye tracking study on thirty developers constrained to reading only Stack Overflow posts while summarizing four open source methods or classes. Results indicate that on average paragraphs and code snippets were fixated upon most often and longest. When ranking pages by number of appearance of code blocks and paragraphs, we found that while the presence of more code blocks did not affect number of fixations, the presence of increasing numbers of plain text paragraphs significantly drove down the fixations on comments. SO posts that were looked at only by students had longer fixation times on code elements within the first ten fixations. We found that 16 developer summaries contained 5 or more meaningful terms from SO posts they viewed. We discuss how our observations of reading behavior could benefit how users structure their posts. 
    more » « less
  3. Deep learning has gained substantial popularity in recent years. Developers mainly rely on libraries and tools to add deep learning capabilities to their software. What kinds of bugs are frequently found in such software? What are the root causes of such bugs? What impacts do such bugs have? Which stages of deep learning pipeline are more bug prone? Are there any antipatterns? Understanding such characteristics of bugs in deep learning software has the potential to foster the development of better deep learning platforms, debugging mechanisms, development practices, and encourage the development of analysis and verification frameworks. Therefore, we study 2716 high-quality posts from Stack Overflow and 500 bug fix commits from Github about five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand the types of bugs, root causes of bugs, impacts of bugs, bug-prone stage of deep learning pipeline as well as whether there are some common antipatterns found in this buggy software. The key findings of our study include: data bug and logic bug are the most severe bug types in deep learning software appearing more than 48% of the times, major root causes of these bugs are Incorrect Model Parameter (IPS) and Structural Inefficiency (SI) showing up more than 43% of the times.We have also found that the bugs in the usage of deep learning libraries have some common antipatterns. 
    more » « less
  4. Significant interest in applying Deep Neural Network (DNN) has fueled the need to support engineering of software that uses DNNs. Repairing software that uses DNNs is one such unmistakable SE need where automated tools could be very helpful; however, we do not fully understand challenges to repairing and patterns that are utilized when manually repairing them. What challenges should automated repair tools address? What are the repair patterns whose automation could help developers? Which repair patterns should be assigned a higher priority for automation? This work presents a comprehensive study of bug fix patterns to address these questions. We have studied 415 repairs from Stack Overflow and 555 repairs from GitHub for five popular deep learning libraries Caffe, Keras, Tensorflow, Theano, and Torch to understand challenges in repairs and bug repair patterns. Our key findings reveal that DNN bug fix patterns are distinctive compared to traditional bug fix patterns; the most common bug fix patterns are fixing data dimension and neural network connectivity; DNN bug fixes have the potential to introduce adversarial vulnerabilities; DNN bug fixes frequently introduce new bugs; and DNN bug localization, reuse of trained model, and coping with frequent releases are major challenges faced by developers when fixing bugs. We also contribute a benchmark of 667 DNN (bug, repair) instances. 
    more » « less
  5. To keep up with changes in requirements, frameworks, and coding practices, software organizations might need to migrate code from one language to another. Source-to-source migration, or transpilation, is often a complex, manual process. Transpilation requires expertise both in the source and target language, making it highly laborious and costly. Languages models for code generation and transpilation are becoming increasingly popular. However, despite capturing code-structure well, code generated by language models is often spurious and contains subtle problems. We proposeBatFix, a novel approach that augments language models for transpilation by leveraging program repair and synthesis to fix the code generated by these models.BatFixtakes as input both the original program, the target program generated by the machine translation model, and a set of test cases and outputs a repaired program that passes all test cases. Experimental results show that our approach is agnostic to language models and programming languages.BatFixcan locate bugs spawning multiple lines and synthesize patches for syntax and semantic bugs for programs migrated fromJavatoC++andPythontoC++from multiple language models, including, OpenAI’sCodex. 
    more » « less