skip to main content


Title: Studying Developer Reading Behavior on Stack Overflow during API Summarization Tasks
Stack Overflow is commonly used by software developers to help solve problems they face while working on software tasks such as fixing bugs or building new features. Recent research has explored how the content of Stack Overflow posts affects attraction and how the reputation of users attracts more visitors. However, there is very little evidence on the effect that visual attractors and content quantity have on directing gaze toward parts of a post, and which parts hold the attention of a user longer. Moreover, little is known about how these attractors help developers (students and professionals) answer comprehension questions. This paper presents an eye tracking study on thirty developers constrained to reading only Stack Overflow posts while summarizing four open source methods or classes. Results indicate that on average paragraphs and code snippets were fixated upon most often and longest. When ranking pages by number of appearance of code blocks and paragraphs, we found that while the presence of more code blocks did not affect number of fixations, the presence of increasing numbers of plain text paragraphs significantly drove down the fixations on comments. SO posts that were looked at only by students had longer fixation times on code elements within the first ten fixations. We found that 16 developer summaries contained 5 or more meaningful terms from SO posts they viewed. We discuss how our observations of reading behavior could benefit how users structure their posts.  more » « less
Award ID(s):
1855756
NSF-PAR ID:
10251912
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER)
Page Range / eLocation ID:
195 to 205
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Nowadays, cyberattack incidents are happening on a daily basis. As a result, the demand for a larger and more challenging workforce is increasing. To handle this demand, academic institutions offer cybersecurity courses and degree programs into their curricula; however, more efforts are needed to address the high demand of the cybersecurity workforce. This work aims to bridge the gap between workforce shortage and the number of qualified graduates to fill the positions. We approach this by introducing cybersecurity concepts at the early stage of undergraduate curricula of computer science and engineering programs. Secure programming is critical as many cybersecurity incidents happen due to software vulnerabilities. However, most UG-level programming courses pay little attention to secure programming practices. As a result, many students graduate with limited knowledge of security vulnerabilities that might plague the developed software. Our goal in this work is to introduce secure programming at introductory level programming courses so that students should be aware of cybersecurity issues and use this security mindset in advanced level courses and projects in their degree programs. To accomplish this goal, we developed intuitive and interactive modules emphasizing secure programming in C++ and Java courses to help students become secure software developers. These modules will be used alongside the coursework to emphasize certain vulnerabilities within the programming environment of a specific language and allow students to learn cybersecurity topics, enforcing a solid foundation and understanding. We developed cybersecurity educational modules for C++ and Java as they are amongst the popular languages and used in introductory programming courses. While designing these modules, we kept in mind that the topics must be relevant to real-world issues in the software industry. We used a variety of resources and benchmarks to ensure the authenticity of our chosen topics, including Common Weakness Enumeration (CWE) and Common Vulnerability and Exposures (CVE). While choosing module topics to develop, we had some restrictions. For example, the topics must be introductory and easy to understand. These modules are geared towards freshman or sophomore-level UG students who have just started programming. The developed security modules have four components: power-point slides, lab description, code template for the lab, and complete solution. The complete solution for each module will be provided to the instructors to check students’ work if they adopt the modules in their courses. The modules developed for a C++ programming course include labs on input validation, integer overflow, random number generation, function call with incorrect argument type, and dangling pointers. In Java, we developed lab modules for input validation, integer overflow, null object reference, random number generator, and data encapsulation. 
    more » « less
  2. Busjahn et al. [4] on the factors influencing dwell time during source code reading, where source code element type and frequency of gaze visits are studied as factors. Unlike the previous study, this study focuses on analyzing eye movement data in large open source Java projects. Five experts and thirteen novices participated in the study where the main task is to summarize methods. The results examine semantic line-level information that developers view during summarization. We find no correlation between the line length and the total duration of time spent looking on the line even though it exists between a token’s length and the total fixation time on the token reported in prior work. The first fixations inside a method are more likely to be on a method’s signature, a variable declaration, or an assignment compared to the other fixations inside a method. In addition, it is found that smaller methods tend to have shorter overall fixation duration for the entire method, but have significantly longer duration per line in the method. The analysis provides insights into how source code’s unique characteristics can help in building more robust methods for analyzing eye movements in source code and overall in building theories to support program comprehension on realistic tasks. 
    more » « less
  3. The programming language Julia is designed to solve the 'two language problem', where developers who write scientific software can achieve desired performance, without sacrificing productivity. Since its inception in 2012, developers who have been using other programming languages have transitioned to Julia. A systematic investigation of the questions that developers ask about Julia can help in understanding the challenges that developers face while using Julia. Such understanding can be helpful (i) for toolsmiths who can construct tools so that developers can maximize their experience of using Julia, and (ii) for Julia language maintainers with empirical evidence on areas to improve the language as well as the Julia ecosystem. We conduct an empirical study with 3,093 Stack Overflow posts where we identify 13 categories of questions related to Julia-based software development. We observe developers to ask about a diverse set of topics, such as GC, Julia's garbage collector, JuMP, a domain-specific language constructed using Julia, and symbols, a metaprogramming utility in Julia. Based on our emerging results, we recommend enhancing support for developers with Julia-based tools and techniques for cross language transfer, type-related assistance, and package resolution. 
    more » « less
  4. Abstract: Health data is considered to be sensitive and personal; both governments and software platforms have enacted specific measures to protect it. Consumer apps that collect health data are becoming more popular, but raise new privacy concerns as they collect unnecessary data, share it with third parties, and track users. However, developers of these apps are not necessarily knowingly endangering users’ privacy; some may simply face challenges working with health features. To scope these challenges, we qualitatively analyzed 269 privacy-related posts on Stack Overflow by developers of health apps for Android- and iOS-based systems. We found that health-specific access control structures (e.g., enhanced requirements for permissions and authentication) underlie several privacy-related challenges developers face. The specific nature of problems often differed between the platforms, for example additional verification steps for Android developers, or confusing feedback about incorrectly formulated permission scopes for iOS. Developers also face problems introduced by third-party libraries. Official documentation plays a key part in understanding privacy requirements, but in some cases, may itself cause confusion. We discuss implications of our findings and propose ways to improve developers’ experience of working with health-related features -- and consequently to improve the privacy of their apps’ end users. 
    more » « less
  5. The paper presents results from a pilot questionnaire-based study on ten Stack Overflow (SO) questions. Eleven developers were tasked with determining if the SO question sentiment was positive, negative or neutral. The results from the questionnaire indicate that developers mostly rated the sentiment of SO questions as neutral, stating that they received little or no emotional feedback from the questions. Tools that were designed to analyze Software Engineering related texts (SentiStrength-SE, SentiCR, and Senti4SD) were on average more closely aligned with developer ratings for a majority of the questions than general purpose tools for detecting SO question sentiment. We discuss cases where tools and developer sentiment differ along with implications of the results. Overall, the sentiment tool output on the question title and body is more aligned with the developer rating than just the title alone. Since SO is a very common medium of technical exchange, we also report that adding code snippets, short titles, and multiple tags were top three features developers prefer in SO questions in order for it to be answered quickly. 
    more » « less