skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: From Novice to Expert: Analysis of Token Level Effects in a Longitudinal Eye Tracking Study
Program comprehension is a vital skill in software development. This work investigates program comprehension by examining the eye movement of novice programmers as they gain programming experience over the duration of a Java course. Their eye movement behavior is compared to the eye movement of expert programmers. Eye movement studies of natural text show that word frequency and length influence eye movement duration and act as indicators of reading skill. The study uses an existing longitudinal eye tracking dataset with 20 novice and experienced readers of source code. The work investigates the acquisition of the effects of token frequency and token length in source code reading as an indication of program reading skill. The results show evidence of the frequency and length effects in reading source code and the acquisition of these effects by novices. These results are then leveraged in a machine learning model demonstrating how eye movement can be used to estimate programming proficiency and classify novices from experts with 72% accuracy.  more » « less
Award ID(s):
1855756
PAR ID:
10267218
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
2021 IEEE/ACM 29th International Conference on Program Comprehension (ICPC)
Volume:
1
Page Range / eLocation ID:
172 to 183
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Studies of eye movements during source code reading have supported the idea that reading source code differs fundamentally from reading natural text. The paper analyzed an existing data set of natural language and source code eye movement data using the E-Z reader model of eye movement control. The results show that the E-Z reader model can be used with natural text and with source code where it provides good predictions of eye movement duration. This result is confirmed by comparing model predictions to eye-movement data from this experiment and calculating the correlation score for each metric. Finally, it was found that gaze duration is influenced by token frequency in code and in natural text. The frequency effect is less pronounced on first fixation duration and single fixation duration. An eye movement control model for source code reading may open the door for tools in education and the industry to enhance program comprehension. 
    more » « less
  2. Busjahn et al. [4] on the factors influencing dwell time during source code reading, where source code element type and frequency of gaze visits are studied as factors. Unlike the previous study, this study focuses on analyzing eye movement data in large open source Java projects. Five experts and thirteen novices participated in the study where the main task is to summarize methods. The results examine semantic line-level information that developers view during summarization. We find no correlation between the line length and the total duration of time spent looking on the line even though it exists between a token’s length and the total fixation time on the token reported in prior work. The first fixations inside a method are more likely to be on a method’s signature, a variable declaration, or an assignment compared to the other fixations inside a method. In addition, it is found that smaller methods tend to have shorter overall fixation duration for the entire method, but have significantly longer duration per line in the method. The analysis provides insights into how source code’s unique characteristics can help in building more robust methods for analyzing eye movements in source code and overall in building theories to support program comprehension on realistic tasks. 
    more » « less
  3. Code summarization is the task of creating short, natural language descriptions of source code. It is an important part of code comprehension and a powerful method of documentation. Previous work has made progress in identifying where programmers focus in code as they write their own summaries (i.e., Writing). However, there is currently a gap in studying programmers’ attention as they read code with pre-written summaries (i.e., Reading). As a result, it is currently unknown how these two forms of code comprehension compare: Reading and Writing. Also, there is a limited understanding of programmer attention with respect to program semantics. We address these shortcomings with a human eye-tracking study (n= 27) comparing Reading and Writing. We examined programmers’ attention with respect to fine-grained program semantics, including their attention sequences (i.e., scan paths). We find distinctions in programmer attention across the comprehension tasks, similarities in reading patterns between them, and differences mediated by demographic factors. This can help guide code comprehension in both computer science education and automated code summarization. Furthermore, we mapped programmers’ gaze data onto the Abstract Syntax Tree to explore another representation of human attention. We find that visual behavior on this structure is not always consistent with that on source code. 
    more » « less
  4. cognition model (i.e., bottom-up or top-down) applied during program comprehension tasks. The cognition models examine how programmers understand source code by describing the temporary information structures in the programmer’s short term memory. The two types of models that we are interested in are top-down and bottom-up. The top-down model is normally applied as-needed (i.e., the domain of the system is familiar). The bottom-up model is typically applied when a developer is not familiar with the domain or the source code. An eye-tracking study of 18 developers reading and summarizing Java methods is used as our dataset for analyzing the mental cognition model. The developers provide a written summary for methods assigned to them. In total, 63 methods are used from five different systems. The results indicate that on average, experts and novices read the methods more closely (using the bottom-up mental model) than bouncing around (using top-down). However, on average novices spend longer gaze time performing bottom-up (66s.) compared to experts (43s.) 
    more » « less
  5. Program comprehension is an important, but hard to measure cognitive process. This makes it difficult to provide suitable programming languages, tools, or coding conventions to support developers in their everyday work. Here, we explore whether functional magnetic resonance imaging (fMRI) is feasible for soundly measuring program comprehension. To this end, we observed 17 participants inside an fMRI scanner while they were comprehending source code. The results show a clear, distinct activation of five brain regions, which are related to working memory, attention, and language processing, which all fit well to our understanding of program comprehension. Furthermore, we found reduced activity in the default mode network, indicating the cognitive effort necessary for program comprehension. We also observed that familiarity with Java as underlying programming language reduced cognitive effort during program comprehension. To gain confidence in the results and the method, we replicated the study with 11 new participants and largely confirmed our findings. Our results encourage us and, hopefully, others to use fMRI to observe programmers and, in the long run, answer questions, such as: How should we train programmers? Can we train someone to become an excellent programmer? How effective are new languages and tools for program comprehension? 
    more » « less