Code summarization is the task of creating short, natural language descriptions of source code. It is an important part of code comprehension and a powerful method of documentation. Previous work has made progress in identifying where programmers focus in code as they write their own summaries (i.e., Writing). However, there is currently a gap in studying programmers’ attention as they read code with pre-written summaries (i.e., Reading). As a result, it is currently unknown how these two forms of code comprehension compare: Reading and Writing. Also, there is a limited understanding of programmer attention with respect to program semantics. We address these shortcomings with a human eye-tracking study (n= 27) comparing Reading and Writing. We examined programmers’ attention with respect to fine-grained program semantics, including their attention sequences (i.e., scan paths). We find distinctions in programmer attention across the comprehension tasks, similarities in reading patterns between them, and differences mediated by demographic factors. This can help guide code comprehension in both computer science education and automated code summarization. Furthermore, we mapped programmers’ gaze data onto the Abstract Syntax Tree to explore another representation of human attention. We find that visual behavior on this structure is not always consistent with that on source code.
more »
« less
Write, Read, or Fix? Exploring Alternative Methods for Secure Development Studies
When studying how software developers perform security tasks, researchers often ask participants to write code. These studies can be challenging because programming can be time-consuming and frustrating. This paper explores whether alternatives to code-writing can yield scientifically valid results while reducing participant stress. We conducted a remote study in which Python programmers completed two encryption tasks using an assigned library by either writing code from scratch, reading existing code and identifying issues, or fixing issues in existing code. We found that the read and fix conditions were less effective than the write condition in revealing security problems with APIs and their documentation, but still provided useful insights. Meanwhile, the read and especially fix conditions generally resulted in more positive participant experiences. Based on these findings, we make preliminary recommendations for how and when researchers might best use all three study design methods; we also recommend future work to further explore the uses and trade-offs of these approaches.
more »
« less
- Award ID(s):
- 1943215
- PAR ID:
- 10631435
- Publisher / Repository:
- USENIX Symposium on Usable Privacy and Security (SOUPS)
- Date Published:
- ISBN:
- 978-1-939133-42-7
- Format(s):
- Medium: X
- Location:
- Philadelphia, PA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Experts often use particular control flow structures to make their code easier to read and modify, such as using the logical operator AND to conjoin conditions rather than nesting separate if statements. Within Boolean expressions, experts take advantage of short-circuit evaluation by ordering their conditions to avoid errors (such as checking that an index is within the bounds of an array before examining the value at that index). How well do students understand these structures? We investigate students' use and understanding of conjoined versus separate conditions within a larger assessment of 125 undergraduate students at the end of their second- and third-semester CS courses (in algorithms & data structures and introductory software engineering). The assessment asked students to: write code where an edge case error could be avoided with short-circuit evaluation, revise their code with nudges towards expert structure, and answer comprehension questions involving code tracing. When writing, students frequently forgot to check for a key edge case. When that case was included, the check was often separated in its own if-statement rather than conjoined with the other conditions. This could indicate a stylistic choice or a belief that the check had to be separated for functionality. Notably, students who included all necessary conditions rarely exhibited the error of ordering them incorrectly. However, with code comprehension, students demonstrated significant misunderstandings about the effects of condition ordering. Students were more accurate on comprehension tasks with nested ifs than conjoined conditions, and this effect was most pronounced when the ordering of the conditions would lead to errors. When conditions were conjoined in a single expression, only 35% of students recognized that checking a value at an index before checking that the index was in bounds would lead to an error. However, 54% of students recognized the problem when the conditions were separated into individual if-statements. This demonstrates a subtlety in code execution that intermediate students may not have mastered and emphasizes the challenges in assessing students' understanding solely via the way they write code.more » « less
-
Since intermediate CS students can use a variety of control structures, why do their choices often not match experts' Students may not realize what choices expert prefer, find non-expert choices easier to read, or simply forget to write with expert structure. To disentangle these explanations, we surveyed 328 2nd and 3rd semester undergraduates, with tasks including writing short functions, selecting which structure was most readable or best styled, and comprehension questions. Questions focused on seven control structure topics that were important to instructors (e.g., factoring out repeated code between an if-block and its else). Students frequently wrote with non-expert structure, and, for five topics, at least 1/3 of students (48% - 71%) thought a non-expert structure was more readable than the expert one. However, students often made one choice when writing code, but preferred a different choice when reading it. Additionally, for more complex topics, students often failed to notice (or understand) differences in execution caused by changes in structure. Together, these results suggest that instruction and practice for choosing control structures should be context-specific, and that assessment focused only on code writing may miss underlying misunderstandings.more » « less
-
Research efforts tried to expose students to security topics early in the undergraduate CS curriculum. However, such efforts are rarely adopted in practice and remain less effective when it comes to writing secure code. In our prior work, we identified key issues with the how students code and grouped them into six themes: (a) Knowledge of C, (b) Understanding compiler and OS messages, (c) Utilization of resources, (d) Knowledge of memory, (e) Awareness of unsafe functions, and (f) Understanding of security topics. In this work, we aim to understand students' knowledge about each theme and how that knowledge affects their secure coding practices. Thus, we propose a modified SOLO taxonomy for the latter five themes. We apply the taxonomy to the coding interview data of 21 students from two US R1 universities. Our results suggest that most students have limited knowledge of each theme. We also show that scoring low in these themes correlates with why students fail to write secure code and identify possible vulnerabilities.more » « less
-
Often, security topics are only taught in advanced computer science (CS) courses. However, most US R1 universities do not require students to take these courses to complete an undergraduate CS degree. As a result, students can graduate without learning about computer security and secure programming practices. To gauge students’ knowledge and skills of secure programming, we conducted a coding interview with 21 students from two R1 universities in the United States. All the students in our study had at least taken Computer Systems or an equivalent course. We then analyzed the students’ approach to safe programming practices, such as avoiding unsafe functions like gets and strcpy, and basic security knowledge, such as writing code that assumes user inputs can be malicious. Our results suggest that students lack the key fundamental skills to write secure programs. For example, students rarely pay attention to details, such as compiler warnings, and often do not read programming language documentation with care. Moreover, some students’ understanding of memory layout is cursory, which is crucial for writing secure programs. We also found that some students are struggling with even the basics of C programming, even though it is the main language taught in Computer Systems courses.more » « less
An official website of the United States government

