skip to main content


Title: Can the E-Z Reader Model Predict Eye Movements Over Code? Towards a Model of Eye Movements Over Source Code
Studies of eye movements during source code reading have supported the idea that reading source code differs fundamentally from reading natural text. The paper analyzed an existing data set of natural language and source code eye movement data using the E-Z reader model of eye movement control. The results show that the E-Z reader model can be used with natural text and with source code where it provides good predictions of eye movement duration. This result is confirmed by comparing model predictions to eye-movement data from this experiment and calculating the correlation score for each metric. Finally, it was found that gaze duration is influenced by token frequency in code and in natural text. The frequency effect is less pronounced on first fixation duration and single fixation duration. An eye movement control model for source code reading may open the door for tools in education and the industry to enhance program comprehension.  more » « less
Award ID(s):
1855756
NSF-PAR ID:
10251910
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ETRA '20 Short Papers: ACM Symposium on Eye Tracking Research and Applications
Page Range / eLocation ID:
1 to 4
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Program comprehension is a vital skill in software development. This work investigates program comprehension by examining the eye movement of novice programmers as they gain programming experience over the duration of a Java course. Their eye movement behavior is compared to the eye movement of expert programmers. Eye movement studies of natural text show that word frequency and length influence eye movement duration and act as indicators of reading skill. The study uses an existing longitudinal eye tracking dataset with 20 novice and experienced readers of source code. The work investigates the acquisition of the effects of token frequency and token length in source code reading as an indication of program reading skill. The results show evidence of the frequency and length effects in reading source code and the acquisition of these effects by novices. These results are then leveraged in a machine learning model demonstrating how eye movement can be used to estimate programming proficiency and classify novices from experts with 72% accuracy. 
    more » « less
  2. Busjahn et al. [4] on the factors influencing dwell time during source code reading, where source code element type and frequency of gaze visits are studied as factors. Unlike the previous study, this study focuses on analyzing eye movement data in large open source Java projects. Five experts and thirteen novices participated in the study where the main task is to summarize methods. The results examine semantic line-level information that developers view during summarization. We find no correlation between the line length and the total duration of time spent looking on the line even though it exists between a token’s length and the total fixation time on the token reported in prior work. The first fixations inside a method are more likely to be on a method’s signature, a variable declaration, or an assignment compared to the other fixations inside a method. In addition, it is found that smaller methods tend to have shorter overall fixation duration for the entire method, but have significantly longer duration per line in the method. The analysis provides insights into how source code’s unique characteristics can help in building more robust methods for analyzing eye movements in source code and overall in building theories to support program comprehension on realistic tasks. 
    more » « less
  3. cognition model (i.e., bottom-up or top-down) applied during program comprehension tasks. The cognition models examine how programmers understand source code by describing the temporary information structures in the programmer’s short term memory. The two types of models that we are interested in are top-down and bottom-up. The top-down model is normally applied as-needed (i.e., the domain of the system is familiar). The bottom-up model is typically applied when a developer is not familiar with the domain or the source code. An eye-tracking study of 18 developers reading and summarizing Java methods is used as our dataset for analyzing the mental cognition model. The developers provide a written summary for methods assigned to them. In total, 63 methods are used from five different systems. The results indicate that on average, experts and novices read the methods more closely (using the bottom-up mental model) than bouncing around (using top-down). However, on average novices spend longer gaze time performing bottom-up (66s.) compared to experts (43s.) 
    more » « less
  4. Abstract

    We know that reading involves coordination between textual characteristics and visual attention, but research linking eye movements during reading and comprehension assessed after reading is surprisingly limited, especially for reading long connected texts. We tested two competing possibilities: (a) the weak association hypothesis: Links between eye movements and comprehension are weak and short‐lived, versus (b) the strong association hypothesis: The two are robustly linked, even after a delay. Using a predictive modeling approach, we trained regression models to predict comprehension scores from global eye movement features, using participant‐level cross‐validation to ensure that the models generalize across participants. We used data from three studies in which readers (Ns = 104, 130, 147) answered multiple‐choice comprehension questions ~30 min after reading a 6,500‐word text, or after reading up to eight 1,000‐word texts. The models generated accurate predictions of participants' text comprehension scores (correlations between observed and predicted comprehension: 0.384, 0.362, 0.372,ps < .001), in line with the strong association hypothesis. We found that making more, but shorter fixations, consistently predicted comprehension across all studies. Furthermore, models trained on one study's data could successfully predict comprehension on the others, suggesting generalizability across studies. Collectively, these findings suggest that there is a robust link between eye movements and subsequent comprehension of a long connected text, thereby connecting theories of low‐level eye movements with those of higher order text processing during reading.

     
    more » « less
  5. Ribeiro, Haroldo V. (Ed.)
    Reading is a complex cognitive process that involves primary oculomotor function and high-level activities like attention focus and language processing. When we read, our eyes move by primary physiological functions while responding to language-processing demands. In fact, the eyes perform discontinuous twofold movements, namely, successive long jumps (saccades) interposed by small steps (fixations) in which the gaze “scans” confined locations. It is only through the fixations that information is effectively captured for brain processing. Since individuals can express similar as well as entirely different opinions about a given text, it is therefore expected that the form, content and style of a text could induce different eye-movement patterns among people. A question that naturally arises is whether these individuals’ behaviours are correlated, so that eye-tracking while reading can be used as a proxy for text subjective properties. Here we perform a set of eye-tracking experiments with a group of individuals reading different types of texts, including children stories, random word generated texts and excerpts from literature work. In parallel, an extensive Internet survey was conducted for categorizing these texts in terms of their complexity and coherence, considering a large number of individuals selected according to different ages, gender and levels of education. The computational analysis of the fixation maps obtained from the gaze trajectories of the subjects for a given text reveals that the average “magnetization” of the fixation configurations correlates strongly with their complexity observed in the survey. Moreover, we perform a thermodynamic analysis using the Maximum-Entropy Model and find that coherent texts were closer to their corresponding “critical points” than non-coherent ones, as computed from the Pairwise Maximum-Entropy method, suggesting that different texts may induce distinct cohesive reading activities. 
    more » « less