Bug tracking systems, which help to track the reported software bugs, have been widely used in software development and maintenance. In these systems, recognizing relevant source files among a large number of source files for a given bug report is a time-consuming and labor-intensive task for software developers. To tackle this problem, information retrieval methods have been widely used to capture either the textual similarities or the semantic similarities between bug reports and source files. However, these two types of similarities are usually considered separately and the historical bug fixings are largely ignored by the existing methods. In this paper, we propose a supervised topic modeling method (STMLOCATOR) for automatically locating the relevant source files for a given bug report. In particular, the proposed model is built upon three key observations. First, supervised modeling can effectively make use of the existing fixing histories. Second, certain words in bug reports tend to appear multiple times in their relevant source files. Third, longer source files tend to have more bugs. By integrating the above three observations, the proposed STMLOCATOR utilizes historical fixings in a supervised way and learns both the textual similarities and semantic similarities between bug reports and source files. We further consider a special type of bug reports with stack-traces in bug reports, and propose a variant of STMLOCATOR to tailor for such bug reports. Experimental evaluations on three real data sets demonstrate that the proposed STMLOCATOR can achieve up to 23.6% improvement in terms of prediction accuracy over its best competitors, and scales linearly with the size of the data. Moreover, the proposed variant further improves STMLOCATOR by up to 76.2% on those bug reports with stack-traces.
more »
« less
Automatically Selecting Follow-up Questions for Deficient Bug Reports
The availability of quality information in bug reports
that are created daily by software users is key to rapidly
fixing software faults. Improving incomplete or deficient bug
reports, which are numerous in many popular and actively
developed open source software projects, can make software
maintenance more effective and improve software quality. In
this paper, we propose a system that addresses the problem
of bug report incompleteness by automatically posing follow-up
questions, intended to elicit answers that add value and provide
missing information to a bug report. Our system is based on
selecting follow-up questions from a large corpus of already
posted follow-up questions on GitHub. To estimate the best
follow-up question for a specific deficient bug report we combine
two metrics based on: 1) the compatibility of a follow-up question
to a specific bug report; and 2) the utility the expected answer
to the follow-up question would provide to the deficient bug
report. Evaluation of our system, based on a manually annotated
held-out data set, indicates improved performance over a set of
simple and ablation baselines. A survey of software developers
confirms the held-out set evaluation result that about half of the
selected follow-up questions are considered valid. The survey also
indicates that the valid follow-up questions are useful and can
provide new information to a bug report most of the time, and
are specific to a bug report some of the time.
more »
« less
- Award ID(s):
- 1813253
- NSF-PAR ID:
- 10287699
- Date Published:
- Journal Name:
- IEEE International Working Conference on Mining Software Repositories
- ISSN:
- 2160-1852
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
One of the most important tasks related to managing bug reports is localizing the fault so that a fix can be applied. As such, prior work has aimed to automate this task of bug localization by formulating it as an information retrieval problem, where potentially buggy files are retrieved and ranked according to their textual similarity with a given bug report. However, there is often a notable semantic gap between the information contained in bug reports and identifiers or natural language contained within source code files. For user-facing software, there is currently a key source of information that could aid in bug localization, but has not been thoroughly investigated - information from the GUI. We investigate the hypothesis that, for end user-facing applications, connecting information in a bug report with information from the GUI, and using this to aid in retrieving potentially buggy files, can improve upon existing techniques for bug localization. To examine this phenomenon, we conduct a comprehensive empirical study that augments four baseline techniques for bug localization with GUI interaction information from a reproduction scenario to (i) filter out potentially irrelevant files, (ii) boost potentially relevant files, and (iii) reformulate text-retrieval queries. To carry out our study, we source the current largest dataset of fully-localized and reproducible real bugs for Android apps, with corresponding bug reports, consisting of 80 bug reports from 39 popular open-source apps. Our results illustrate that augmenting traditional techniques with GUI information leads to a marked increase in effectiveness across multiple metrics, including a relative increase in Hits@10 of 13-18%. Additionally, through further analysis, we find that our studied augmentations largely complement existing techniques.more » « less
-
Bug report reproduction is an important, but time-consuming task carried out during mobile app maintenance. To accelerate this task, current research has proposed automated reproduction techniques that rely on a guided dynamic exploration of the app to match bug report steps with UI events in a mobile app. However, these techniques struggle to find the correct match when the bug reports have missing or inaccurately described steps. To address these limitations, we propose a new bug report reproduction technique that uses an app’s UI model to perform a global search across all possible matches between steps and UI actions and identify the most likely match while accounting for the possibility of missing or inaccurate steps. To do this, our approach redefines the bug report reproduction process as a Markov model and finds the best paths through the model using a dynamic programming based technique. We conducted an empirical evaluation on 72 real-world bug reports. Our approach achieved a 94% reproduction rate on the total bug reports and a 93% reproduction rate on bug reports with missing steps, significantly outperforming the state-of-the-art approaches. Our approach was also more effective in finding the matches from the steps to UI events than the state-of-the-art approaches.more » « less
-
Automatically fixing software bugs is a challenging task. While recent work showed that natural language context is useful in guiding bug-fixing models, the approach required prompting developers to provide this context, which was simulated through commit messages written after the bug-fixing code changes were made. We instead propose using bug report discussions, which are available before the task is performed and are also naturally occurring, avoiding the need for any additional information from developers. For this, we augment standard bug-fixing datasets with bug report discussions. Using these newly compiled datasets, we demonstrate that various forms of natural language context derived from such discussions can aid bug-fixing, even leading to improved performance over using commit messages corresponding to the oracle bug-fixing commits.more » « less
-
null (Ed.)Software developers use modern chat platforms to communicate about the status of a project and to coordinate development and release efforts, among other things. Developers also use chat platforms to ask technical questions to other developers. While some questions are project-specific and require an experienced developer familiar with the system to answer, many questions are rather general and may have been already answered by other developers on platforms such as the Q&A site StackOverflow. In this paper, we present GitterAns, a bot that can automatically detect when a developer asks a technical question in a chat and leverages the information present in Q&A forums to provide the developer with possible answers to their question. The results of a preliminary study indicate promising results, with GitterAns achieving an accuracy of 0.78 in identifying technical questions.more » « less