skip to main content


Title: DeepCode: An Annotated Set of Instructional Code Examples to Foster Deep Code Comprehension and Learning
We present here a novel instructional resource, called DeepCode, to support deep code comprehension and learning in intro-to-programming courses (CS1 and CS2). DeepCode is a set of instructional code examples which we call a codeset and which was annotated by our team with comments (e.g., explaining the logical steps of the underlying problem being solved) and related instructional questions that can play the role of hints meant to help learners think about and articulate explanations of the code. While DeepCode was designed primarily to serve our larger efforts of developing an intelligent tutoring system (ITS) that fosters the monitoring, assessment, and development of code comprehension skills for students learning to program, the codeset can be used for other purposes such as assessment, problem-solving, and in various other learning activities such as studying worked-out code examples with explanations and code visualizations. We present here the underlying principles, theories, and frameworks behind our design process, the annotation guidelines, and summarize the resulting codeset of 98 annotated Java code examples which include 7,157 lines of code (including comments), 260 logical steps, 260 logical step details, 408 statement level comments, and 590 scaffolding questions.  more » « less
Award ID(s):
1822752
NSF-PAR ID:
10367910
Author(s) / Creator(s):
; ; ; ;
Editor(s):
Crossley, Scott; Popescu, Elvira
Date Published:
Journal Name:
Proceedings of 18th International Conference on Intelligent Tutoring Systems
Page Range / eLocation ID:
36-50
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The ability to automatically assess learners' activities is the key to user modeling and personalization in adaptive educational systems.The work presented in this paper opens an opportunity to expand the scope of automated assessment from traditional programming problems to code comprehension tasks where students are requested to explain the critical steps of a program. The ability to automatically assess these self-explanations offers a unique opportunity to understand the current state of student knowledge, recognize possible misconceptions, and provide feedback. Annotated datasets are needed to train Artificial Intelligence/Machine Learning approaches for the automated assessment of student explanations. To answer this need, we present a novel corpus called SelfCode which consists of 1,770 sentence pairs of student and expert self-explanations of Java code examples, along with semantic similarity judgments provided by experts. We also present a baseline automated assessment model that relies on textual features. The corpus is available at the GitHub repository (https://github.com/jeevanchaps/SelfCode). 
    more » « less
  2. null (Ed.)
    We present in this paper the results of a randomized control trial experiment that compared the effectiveness of two instructional strategies that scaffold learners' code comprehension processes: eliciting Free Self-Explanation and a Socratic Method. Code comprehension, i.e., understanding source code, is a critical skill for both learners and professionals. Improving learners' code comprehension skills should result in improved learning which in turn should help with retention in intro-to-programming courses which are notorious for suffering from very high attrition rates due to the complexity of programming topics. To this end, the reported experiment is meant to explore the effectiveness of various strategies to elicit self-explanation as a way to improve comprehension and learning during complex code comprehension and learning activities in intro-to-programming courses. The experiment showed pre-/post-test learning gains of 30% (M = 0.30, SD = 0.47) for the Free Self-Explanation condition and learning gains of 59% (M = 0.59,SD = 0.39) for the Socratic method. Furthermore, we investigated the behavior of the two strategies as a function of students' prior knowledge which was measured using learners' pretest score. For the Free Self-Explanation condition, there was no significant difference in mean learning gains for low vs. high knowledge students. The magnitude of the difference in performance (mean difference= 0.02,95% CI: -0.34 to 0.39) was very small (eta squared = 0.006). Likewise, the Socratic method showed no significant difference in mean learning gains between low vs. high performing students. The magnitude of the performance difference (mean difference =-0.24,95% CI: -0.534 to 0.03) was large (eta squared = 0.10). These findings suggest that eliciting self-explanations can be used as an effective strategy and that guided self-explanations as in the Socratic method condition is more effective at inducing learning gains. 
    more » « less
  3. Since intermediate CS students can use a variety of control structures, why do their choices often not match experts' Students may not realize what choices expert prefer, find non-expert choices easier to read, or simply forget to write with expert structure. To disentangle these explanations, we surveyed 328 2nd and 3rd semester undergraduates, with tasks including writing short functions, selecting which structure was most readable or best styled, and comprehension questions. Questions focused on seven control structure topics that were important to instructors (e.g., factoring out repeated code between an if-block and its else). Students frequently wrote with non-expert structure, and, for five topics, at least 1/3 of students (48% - 71%) thought a non-expert structure was more readable than the expert one. However, students often made one choice when writing code, but preferred a different choice when reading it. Additionally, for more complex topics, students often failed to notice (or understand) differences in execution caused by changes in structure. Together, these results suggest that instruction and practice for choosing control structures should be context-specific, and that assessment focused only on code writing may miss underlying misunderstandings. 
    more » « less
  4. This paper reports the findings of an empirical study on the effects and nature of self explanation during source code comprehension learning activities in the context of learning computer programming language Java. Our study shows that self explanation helps learning and there is a strong positive correlation between the volume of self-explanation students produce and how much they learn. Furthermore, selfexplanations as an instructional strategy has no discrepancy based on student’s prior knowledge. We found that participants explain target code examples using a combination of language, code references, and mathematical expressions. This is not surprising given the nature of the target item, a computer program, but this indicates that automatically evaluating such self-explanations may require novel techniques compared to self-explanations of narrative or scientific texts. 
    more » « less
  5. This paper reports the findings of an empirical study on the effects and nature of self explanation during source code comprehension learning activities in the context of learning computer programming language Java. Our study shows that self explanation helps learning and there is a strong positive correlation between the volume of self-explanation students produce and how much they learn. Furthermore, selfexplanations as an instructional strategy has no discrepancy based on student’s prior knowledge. We found that participants explain target code examples using a combination of language, code references, and mathematical expressions. This is not surprising given the nature of the target item, a computer program, but this indicates that automatically evaluating such self-explanations may require novel techniques compared to self-explanations of narrative or scientific texts. 
    more » « less