skip to main content


Title: A Look into Programmers' Heads
Program comprehension is an important, but hard to measure cognitive process. This makes it difficult to provide suitable programming languages, tools, or coding conventions to support developers in their everyday work. Here, we explore whether functional magnetic resonance imaging (fMRI) is feasible for soundly measuring program comprehension. To this end, we observed 17 participants inside an fMRI scanner while they were comprehending source code. The results show a clear, distinct activation of five brain regions, which are related to working memory, attention, and language processing, which all fit well to our understanding of program comprehension. Furthermore, we found reduced activity in the default mode network, indicating the cognitive effort necessary for program comprehension. We also observed that familiarity with Java as underlying programming language reduced cognitive effort during program comprehension. To gain confidence in the results and the method, we replicated the study with 11 new participants and largely confirmed our findings. Our results encourage us and, hopefully, others to use fMRI to observe programmers and, in the long run, answer questions, such as: How should we train programmers? Can we train someone to become an excellent programmer? How effective are new languages and tools for program comprehension?  more » « less
Award ID(s):
1755762
NSF-PAR ID:
10085745
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
IEEE Transactions on Software Engineering
ISSN:
0098-5589
Page Range / eLocation ID:
1 to 1
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Background: Researchers have recently started to validate decades-old program-comprehension models using functional magnetic resonance imaging (fMRI). While fMRI helps us to understand neural correlates of cognitive processes during program comprehension, its comparatively low temporal resolution (i.e., seconds) cannot capture fast cognitive subprocesses (i.e., milliseconds). Aims: To increase the explanatory power of fMRI measurement of programmers, we are exploring in this methodological paper the feasibility of adding simultaneous eye tracking to fMRI measurement. By developing a method to observe programmers with two complementary measures, we aim at obtaining a more comprehensive understanding of program comprehension. Method: We conducted a controlled fMRI experiment of 22 student participants with simultaneous eye tracking. Results: We have been able to successfully capture fMRI and eye-tracking data, although with limitations regarding partial data loss and spatial imprecision. The biggest issue that we experienced is the partial loss of data: for only 10 participants, we could collect a complete set of high-precision eye-tracking data. Since some participants of fMRI studies show excessive head motion, the proportion of full and high-quality data on fMRI and eye tracking is rather low. Still, the remaining data allowed us to confirm our prior hypothesis of semantic recall during program comprehension, which was not possible with fMRI alone. Conclusions: Simultaneous measurement of program comprehension with fMRI and eye tracking is promising, but with limitations. By adding simultaneous eye tracking to our fMRI study framework, we can conduct more fine-grained fMRI analyses, which in turn helps us to understand programmer behavior better. 
    more » « less
  2. Recent advances in Large Language Models (LLM) have made automatic code generation possible for real-world programming tasks in general-purpose programming languages such as Python. However, there are few human studies on the usability of these tools and how they fit the programming workflow. In this work, we conducted a within-subjects user study with 24 participants to understand how programmers use and perceive Copilot, a LLM-based code generation tool. We found that, while Copilot did not necessarily improve the task completion time or success rate, most participants preferred to use Copilot in daily programming tasks, since Copilot often provided a useful starting point and saved the effort of searching online. However, participants did face difficulties in understanding, editing, and debugging code snippets generated by Copilot, which significantly hindered their task-solving effectiveness. Finally, we highlighted several promising directions for improving the design of Copilot based on our observations and participants’ feedback. 
    more » « less
  3. Expertise in programming traditionally assumes a binary novice-expert divide. Learning resources typically target programmers who are learning programming for the first time, or expert programmers for that language. An underrepresented, yet important group of programmers are those that are experienced in one programming language, but desire to author code in a different language. For this scenario, we postulate that an effective form of feedback is presented as a transfer from concepts in the first language to the second. Current programming environments do not support this form of feedback. In this study, we apply the theory of learning transfer to teach a language that programmers are less familiar with-such as R-in terms of a programming language they already know-such as Python. We investigate learning transfer using a new tool called Transfer Tutor that presents explanations for R code in terms of the equivalent Python code. Our study found that participants leveraged learning transfer as a cognitive strategy, even when unprompted. Participants found Transfer Tutor to be useful across a number of affordances like stepping through and highlighting facts that may have been missed or misunderstood. However, participants were reluctant to accept facts without code execution or sometimes had difficulty reading explanations that are verbose or complex. These results provide guidance for future designs and research directions that can support learning transfer when learning new programming languages. 
    more » « less
  4. Gradually typed languages allow programmers to mix statically and dynamically typed code, enabling them to incrementally reap the benefits of static typing as they add type annotations to their code. However, this type migration process is typically a manual effort with limited tool support. This paper examines the problem of automated type migration: given a dynamic program, infer additional or improved type annotations. Existing type migration algorithms prioritize different goals, such as maximizing type precision, maintaining compatibility with unmigrated code, and preserving the semantics of the original program. We argue that the type migration problem involves fundamental compromises: optimizing for a single goal often comes at the expense of others. Ideally, a type migration tool would flexibly accommodate a range of user priorities. We present TypeWhich, a new approach to automated type migration for the gradually-typed lambda calculus with some extensions. Unlike prior work, which relies on custom solvers, TypeWhich produces constraints for an off-the-shelf MaxSMT solver. This allows us to easily express objectives, such as minimizing the number of necessary syntactic coercions, and constraining the type of the migration to be compatible with unmigrated code. We present the first comprehensive evaluation of GTLC type migration algorithms, and compare TypeWhich to four other tools from the literature. Our evaluation uses prior benchmarks, and a new set of "challenge problems." Moreover, we design a new evaluation methodology that highlights the subtleties of gradual type migration. In addition, we apply TypeWhich to a suite of benchmarks for Grift, a programming language based on the GTLC. TypeWhich is able to reconstruct all human-written annotations on all but one program. 
    more » « less
  5. Researchers have been employing psycho-physiological measures to better understand program comprehension, for example simultaneous fMRI and eye tracking to validate top-down comprehension models. In this paper, we argue that there is additional value in eye-tracking data beyond eye gaze: Pupil dilation and blink rates may offer insights into programmers' cognitive load. However, the fMRI environment may influence pupil dilation and blink rates, which would diminish their informative value. We conducted a preliminary analysis of pupil dilation and blink rates of an fMRI experiment with 22 student participants. We conclude from our preliminary analysis that the correction for our fMRI environment is challenging, but possible, such that we can use pupil dilation and blink rates to more reliably observe program comprehension. 
    more » « less