Software developers are increasingly having conversations about
software development via online chat services. Many of those chat
communications contain valuable information, such as code descriptions,
good programming practices, and causes of common
errors/exceptions. However, the nature of chat community content
is transient, as opposed to the archival nature of other developer
communications such as email, bug reports and Q&A forums. As a
result, important information and advice are lost over time.
The focus of this dissertation is Extracting Archival Information
from Software-Related Chats, specifically to (1) automatically identify
conversations which contain archival-quality information, (2)
accurately reduce the granularity of the information reported as
archival information, and (3) conduct a case study to investigate
how archival quality information extracted from chats compare to
related posts in Q&A forums. Archiving knowledge from developer
chats that could be used potentially in several applications such
as: creating a new archival mechanism available to a given chat
community, augmenting Q&A forums, or facilitating the mining of
specific information and improving software maintenance tools.
more »
« less
Automatic Extraction of Opinion-based Q&A from Online Developer Chats
Virtual conversational assistants designed specifically for software engineers could have a huge impact on
the time it takes for software engineers to get help. Research
efforts are focusing on virtual assistants that support specific
software development tasks such as bug repair and pair programming. In this paper, we study the use of online chat
platforms as a resource towards collecting developer opinions
that could potentially help in building opinion Q&A systems,
as a specialized instance of virtual assistants and chatbots for
software engineers. Opinion Q&A has a stronger presence in
chats than in other developer communications, thus mining them
can provide a valuable resource for developers in quickly getting
insight about a specific development topic (e.g., What is the best
Java library for parsing JSON?). We address the problem of
opinion Q&A extraction by developing automatic identification of
opinion-asking questions and extraction of participants’ answers
from public online developer chats. We evaluate our automatic
approaches on chats spanning six programming communities
and two platforms. Our results show that a heuristic approach
to opinion-asking questions works well (.87 precision), and a
deep learning approach customized to the software domain
outperforms heuristics-based, machine-learning-based and deep
learning for answer extraction in community question answering.
more »
« less
- Award ID(s):
- 1813253
- NSF-PAR ID:
- 10287696
- Date Published:
- Journal Name:
- Proceedings of the International Conference on Software Engineering
- ISSN:
- 1819-3781
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
More than ever, developers are participating in public chat communities to ask and answer software development questions. With over ten million daily active users, Slack is one of the most popular chat platforms, hosting many active channels focused on software development technologies, e.g., python, react. Prior studies have shown that public Slack chat transcripts contain valuable information, which could provide support for improving automatic software maintenance tools or help researchers understand developer struggles or concerns. In this paper, we present a dataset of software-related Q&A chat conversations, curated for two years from three open Slack communities (python, clojure, elm). Our dataset consists of 38,955 conversations, 437,893 utterances, contributed by 12,171 users. We also share the code for a customized machine-learning based algorithm that automatically extracts (or disentangles) conversations from the downloaded chat transcripts.more » « less
-
A great part of software development involves conceptualizing or communicating the underlying procedures and logic that needs to be expressed in programs. One major difficulty of programming is turning concept into code , especially when dealing with the APIs of unfamiliar libraries. Recently, there has been a proliferation of machine learning methods for code generation and retrieval from natural language queries , but these have primarily been evaluated purely based on retrieval accuracy or overlap of generated code with developer-written code, and the actual effect of these methods on the developer workflow is surprisingly unattested. In this article, we perform the first comprehensive investigation of the promise and challenges of using such technology inside the PyCharm IDE, asking, “At the current state of technology does it improve developer productivity or accuracy, how does it affect the developer experience, and what are the remaining gaps and challenges?” To facilitate the study, we first develop a plugin for the PyCharm IDE that implements a hybrid of code generation and code retrieval functionality, and we orchestrate virtual environments to enable collection of many user events (e.g., web browsing, keystrokes, fine-grained code edits). We ask developers with various backgrounds to complete 7 varieties of 14 Python programming tasks ranging from basic file manipulation to machine learning or data visualization, with or without the help of the plugin. While qualitative surveys of developer experience are largely positive, quantitative results with regards to increased productivity, code quality, or program correctness are inconclusive. Further analysis identifies several pain points that could improve the effectiveness of future machine learning-based code generation/retrieval developer assistants and demonstrates when developers prefer code generation over code retrieval and vice versa. We release all data and software to pave the road for future empirical studies on this topic, as well as development of better code generation models.more » « less
-
Modern software development communities are increasingly social. Popular chat platforms such as Slack host public chat communities that focus on specific development topics such as Python or Ruby-on-Rails. Conversations in these public chats often follow a Q&A format, with someone seeking information and others providing answers in chat form. In this paper, we describe an exploratory study into the potential usefulness and challenges of mining developer Q&A conversations for supporting software maintenance and evolution tools. We designed the study to investigate the availability of information that has been successfully mined from other developer communications, particularly Stack Overflow. We also analyze characteristics of chat conversations that might inhibit accurate automated analysis. Our results indicate the prevalence of useful information, including API mentions and code snippets with descriptions, and several hurdles that need to be overcome to automate mining that information.more » « less
-
Modern software development communities are increasingly social. Popular chat platforms such as Slack host public chat communities that focus on specific development topics such as Python or Ruby-on-Rails. Conversations in these public chats often follow a Q&A format, with someone seeking information and others providing answers in chat form. In this paper, we describe an exploratory study into the potential usefulness and challenges of mining developer Q&A conversations for supporting software maintenance and evolution tools. We designed the study to investigate the availability of information that has been successfully mined from other developer communications, particularly Stack Overflow. We also analyze characteristics of chat conversations that might inhibit accurate automated analysis. Our results indicate the prevalence of useful information, including API mentions and code snippets with descriptions, and several hurdles that need to be overcome to automate mining that information.more » « less