skip to main content


Title: Automatic Extraction of Opinion-based Q&A from Online Developer Chats
Virtual conversational assistants designed specifically for software engineers could have a huge impact on the time it takes for software engineers to get help. Research efforts are focusing on virtual assistants that support specific software development tasks such as bug repair and pair programming. In this paper, we study the use of online chat platforms as a resource towards collecting developer opinions that could potentially help in building opinion Q&A systems, as a specialized instance of virtual assistants and chatbots for software engineers. Opinion Q&A has a stronger presence in chats than in other developer communications, thus mining them can provide a valuable resource for developers in quickly getting insight about a specific development topic (e.g., What is the best Java library for parsing JSON?). We address the problem of opinion Q&A extraction by developing automatic identification of opinion-asking questions and extraction of participants’ answers from public online developer chats. We evaluate our automatic approaches on chats spanning six programming communities and two platforms. Our results show that a heuristic approach to opinion-asking questions works well (.87 precision), and a deep learning approach customized to the software domain outperforms heuristics-based, machine-learning-based and deep learning for answer extraction in community question answering.  more » « less
Award ID(s):
1813253
NSF-PAR ID:
10287696
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the International Conference on Software Engineering
ISSN:
1819-3781
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Software developers are increasingly having conversations about software development via online chat services. Many of those chat communications contain valuable information, such as code descriptions, good programming practices, and causes of common errors/exceptions. However, the nature of chat community content is transient, as opposed to the archival nature of other developer communications such as email, bug reports and Q&A forums. As a result, important information and advice are lost over time. The focus of this dissertation is Extracting Archival Information from Software-Related Chats, specifically to (1) automatically identify conversations which contain archival-quality information, (2) accurately reduce the granularity of the information reported as archival information, and (3) conduct a case study to investigate how archival quality information extracted from chats compare to related posts in Q&A forums. Archiving knowledge from developer chats that could be used potentially in several applications such as: creating a new archival mechanism available to a given chat community, augmenting Q&A forums, or facilitating the mining of specific information and improving software maintenance tools. 
    more » « less
  2. More than ever, developers are participating in public chat communities to ask and answer software development questions. With over ten million daily active users, Slack is one of the most popular chat platforms, hosting many active channels focused on software development technologies, e.g., python, react. Prior studies have shown that public Slack chat transcripts contain valuable information, which could provide support for improving automatic software maintenance tools or help researchers understand developer struggles or concerns. In this paper, we present a dataset of software-related Q&A chat conversations, curated for two years from three open Slack communities (python, clojure, elm). Our dataset consists of 38,955 conversations, 437,893 utterances, contributed by 12,171 users. We also share the code for a customized machine-learning based algorithm that automatically extracts (or disentangles) conversations from the downloaded chat transcripts. 
    more » « less
  3. A great part of software development involves conceptualizing or communicating the underlying procedures and logic that needs to be expressed in programs. One major difficulty of programming is turning concept into code , especially when dealing with the APIs of unfamiliar libraries. Recently, there has been a proliferation of machine learning methods for code generation and retrieval from natural language queries , but these have primarily been evaluated purely based on retrieval accuracy or overlap of generated code with developer-written code, and the actual effect of these methods on the developer workflow is surprisingly unattested. In this article, we perform the first comprehensive investigation of the promise and challenges of using such technology inside the PyCharm IDE, asking, “At the current state of technology does it improve developer productivity or accuracy, how does it affect the developer experience, and what are the remaining gaps and challenges?” To facilitate the study, we first develop a plugin for the PyCharm IDE that implements a hybrid of code generation and code retrieval functionality, and we orchestrate virtual environments to enable collection of many user events (e.g., web browsing, keystrokes, fine-grained code edits). We ask developers with various backgrounds to complete 7 varieties of 14 Python programming tasks ranging from basic file manipulation to machine learning or data visualization, with or without the help of the plugin. While qualitative surveys of developer experience are largely positive, quantitative results with regards to increased productivity, code quality, or program correctness are inconclusive. Further analysis identifies several pain points that could improve the effectiveness of future machine learning-based code generation/retrieval developer assistants and demonstrates when developers prefer code generation over code retrieval and vice versa. We release all data and software to pave the road for future empirical studies on this topic, as well as development of better code generation models. 
    more » « less
  4. Modern software development communities are increasingly social. Popular chat platforms such as Slack host public chat communities that focus on specific development topics such as Python or Ruby-on-Rails. Conversations in these public chats often follow a Q&A format, with someone seeking information and others providing answers in chat form. In this paper, we describe an exploratory study into the potential usefulness and challenges of mining developer Q&A conversations for supporting software maintenance and evolution tools. We designed the study to investigate the availability of information that has been successfully mined from other developer communications, particularly Stack Overflow. We also analyze characteristics of chat conversations that might inhibit accurate automated analysis. Our results indicate the prevalence of useful information, including API mentions and code snippets with descriptions, and several hurdles that need to be overcome to automate mining that information. 
    more » « less
  5. Modern software development communities are increasingly social. Popular chat platforms such as Slack host public chat communities that focus on specific development topics such as Python or Ruby-on-Rails. Conversations in these public chats often follow a Q&A format, with someone seeking information and others providing answers in chat form. In this paper, we describe an exploratory study into the potential usefulness and challenges of mining developer Q&A conversations for supporting software maintenance and evolution tools. We designed the study to investigate the availability of information that has been successfully mined from other developer communications, particularly Stack Overflow. We also analyze characteristics of chat conversations that might inhibit accurate automated analysis. Our results indicate the prevalence of useful information, including API mentions and code snippets with descriptions, and several hurdles that need to be overcome to automate mining that information. 
    more » « less