Automated Duplicate Bug Report Detection in Large Open Bug Repositories

Laney, Clare E; Barovic, Andrew; Moin, Armin

doi:10.1109/COMPSAC65507.2025.00065

Citation Details

This content will become publicly available on July 8, 2026

Automated Duplicate Bug Report Detection in Large Open Bug Repositories

Many users and contributors of large open-source projects report software defects or enhancement requests (known as bug reports) to the issue-tracking systems. However, they sometimes report issues that have already been reported. First, they may not have time to do sufficient research on existing bug reports. Second, they may not possess the right expertise in that specific area to realize that an existing bug report is essentially elaborating on the same matter, perhaps with a different wording. In this paper, we propose a novel approach based on machine learning methods that can automatically detect duplicate bug reports in an open bug repository based on the textual data in the reports. We present six alternative methods: Topic modeling, Gaussian Na¨ıve Bayes, deep learning, time-based organization, clustering, and summarization using a generative pre-trained transformer large language model. Additionally, we introduce a novel threshold-based approach for duplicate identification, in contrast to the conventional top-k selection method that has been widely used in the literature. Our approach demonstrates promising results across all the proposed methods, achieving accuracy rates ranging from the high 70%’s to the low 90%’s. We evaluated our methods on a public dataset of issues belonging to an Eclipse open-source project. more »

Award ID(s):: 2349452

PAR ID:: 10657369

Author(s) / Creator(s):: Laney, Clare E ; Barovic, Andrew ; Moin, Armin

Publisher / Repository:: IEEE

Date Published:: 2025-07-08

Page Range / eLocation ID:: 450 to 458

Subject(s) / Keyword(s):: duplicate bug report detection, bug triage, mining software repositories, natural language processing, machine learning, large language models

Format(s):: Medium: X

Location:: Toronto, Canada

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on July 8, 2026
Conference Paper:
https://doi.org/10.1109/COMPSAC65507.2025.00065

More Like this