skip to main content


Title: Comprehension First: Evaluating a Novel Pedagogy and Tutoring System for Program Tracing in CS1
What knowledge does learning programming require? Prior work has focused on theorizing program writing and problem solving skills. We examine program comprehension and propose a formal theory of program tracing knowledge based on control flow paths through an interpreter program's source code. Because novices cannot understand the interpreter's programming language notation, we transform it into causal relationships from code tokens to instructions to machine state changes. To teach this knowledge, we propose a comprehension-first pedagogy based on causal inference, by showing, explaining, and assessing each path by stepping through concrete examples within many example programs. To assess this pedagogy, we built PLTutor, a tutorial system with a fixed curriculum of example programs. We evaluate learning gains among self-selected CS1 students using a block randomized lab study comparing PLTutor with Codecademy, a writing tutorial. In our small study, we find some evidence of improved learning gains on the SCS1, with average learning gains of PLTutor 60% higher than Codecademy (gain of 3.89 vs. 2.42 out of 27 questions). These gains strongly predicted midterms (R2=.64) only for PLTutor participants, whose grades showed less variation and no failures.  more » « less
Award ID(s):
1735123
NSF-PAR ID:
10107748
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
ACM International Computing Education Research Conference
Page Range / eLocation ID:
2 to 11
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This Work-in-Progress paper investigates how students participating in a chemical engineering (ChE) Research Experience for Undergraduates (REU) program conceptualize and make plans for research projects. The National Science Foundation has invested substantial financial resources in REU programs, which allow undergraduate students the opportunity to work with faculty in their labs and to conduct hands-on experiments. Prior research has shown that REU programs have an impact on students’ perceptions of their research skills, often measured through the Undergraduate Research Student Self-Assessment (URSSA) survey. However, few evaluation and research studies have gone beyond perception data to include direct measures of students’ gains from program participation. This work-in-progress describes efforts to evaluate the impact of an REU on students’ conceptualization and planning of research studies using a pre-post semi-structured interview process. The construct being investigated for this study is planning, which has been espoused as a critical step in the self-regulated learning (SRL) process (Winne & Perry, 2000; Zimmerman, 2008). Students who effectively self-regulate demonstrate higher levels of achievement and comprehension (Dignath & Büttner, 2008), and (arguably) work efficiency. Planning is also a critical step in large projects, such as research (Dvir & Lechler, 2004). Those who effectively plan their projects make consistent progress and are more likely to achieve project success (Dvir, Raz, & Shenhar, 2003). Prior REU research has been important in demonstrating some positive impacts of REU programs, but it is time to dig deeper into the potential benefits to REU participation. Many REU students are included in weekly lab meetings, and thus potentially take part in the planning process for research projects. Thus, the research question explored here is: How do REU participants conceptualize and make plans for research projects? The study was conducted in the ChE REU program at a large, mid-Atlantic research-oriented university during the summer of 2018. Sixteen students in the program participated in the study, which entailed them completing a planning task followed by a semi-structured interview at the start and the end of the REU program. During each session, participants read a case statement that asked them to outline a plan in writing for a research project from beginning to end. Using semi-structured interview procedures, their written outlines were then verbally described. The verbalizations were recorded and transcribed. Two members of the research team are currently analyzing the responses using an open coding process to gain familiarity with the transcripts. The data will be recoded based on the initial open coding and in line with a self-regulatory and project-management framework. Results: Coding is underway, preliminary results will be ready by the draft submission deadline. The methods employed in this study might prove fruitful in understanding the direct impact on students’ knowledge, rather than relying on their perceptions of gains. Future research could investigate differences in students’ research plans based on prior research experience, research intensity of students’ home institutions, and how their plans may be impacted by training. 
    more » « less
  2. Training deep neural networks can generate non-descriptive error messages or produce unusual output without any explicit errors at all. While experts rely on tacit knowledge to apply debugging strategies, non-experts lack the experience required to interpret model output and correct Deep Learning (DL) programs. In this work, we identify DL debugging heuristics and strategies used by experts, andIn this work, we categorize the types of errors novices run into when writing ML code, and map them onto opportunities where tools could help. We use them to guide the design of Umlaut. Umlaut checks DL program structure and model behavior against these heuristics; provides human-readable error messages to users; and annotates erroneous model output to facilitate error correction. Umlaut links code, model output, and tutorial-driven error messages in a single interface. We evaluated Umlaut in a study with 15 participants to determine its effectiveness in helping developers find and fix errors in their DL programs. Participants using Umlaut found and fixed significantly more bugs and were able to implement fixes for more bugs compared to a baseline condition. 
    more » « less
  3. In recent years, the pace of innovations in the fields of machine learning (ML) has accelerated, researchers in SysML have created algorithms and systems that parallelize ML training over multiple devices or computational nodes. As ML models become more structurally complex, many systems have struggled to provide all-round performance on a variety of models. Particularly, ML scale-up is usually underestimated in terms of the amount of knowledge and time required to map from an appropriate distribution strategy to the model. Applying parallel training systems to complex models adds nontrivial development overheads in addition to model prototyping, and often results in lower-than-expected performance. This tutorial identifies research and practical pain points in parallel ML training, and discusses latest development of algorithms and systems on addressing these challenges in both usability and performance. In particular, this tutorial presents a new perspective of unifying seemingly different distributed ML training strategies. Based on it, introduces new techniques and system architectures to simplify and automate ML parallelization. This tutorial is built upon the authors' years' of research and industry experience, comprehensive literature survey, and several latest tutorials and papers published by the authors and peer researchers. The tutorial consists of four parts. The first part will present a landscape of distributed ML training techniques and systems, and highlight the major difficulties faced by real users when writing distributed ML code with big model or big data. The second part dives deep to explain the mainstream training strategies, guided with real use case. By developing a new and unified formulation to represent the seemingly different data- and model- parallel strategies, we describe a set of techniques and algorithms to achieve ML auto-parallelization, and compiler system architectures for auto-generating and exercising parallelization strategies based on models and clusters. The third part of this tutorial exposes a hidden layer of practical pain points in distributed ML training: hyper-parameter tuning and resource allocation, and introduces techniques to improve these aspects. The fourth part is designed as a hands-on coding session, in which we will walk through the audiences on writing distributed training programs in Python, using the various distributed ML tools and interfaces provided by the Ray ecosystem. 
    more » « less
  4. Comprehending programs is key to learning programming. Previous studies highlight novices’ naive approaches to comprehend ing the structural, functional, and behavioral aspects of programs. And yet, with the majority of them examining on-screen program ming environments, we barely know about program comprehension within physical computing—a common K-12 programming context. In this study, we qualitatively analyzed think-aloud inter view videos of 22 high school students individually comprehending a given text-based Arduino program while interacting with its corresponding functional physical artifact to answer two questions: 1) How do novices comprehend the given text-based Arduino pro gram? And, 2) What role does the physical artifact play in program comprehension? We found that novices mostly approached the program bottom-up, initially comprehending structural and later functional aspects, along different granularities. The artifact provided two distinct modes of engagement, active and interactive, that supported the program’s structural and functional comprehension. However, behavioral comprehension i.e. understanding program execution leading to the observed outcome was inaccessible to many. Our findings extend program comprehension literature in two ways: (a) it provides one of the very few accounts of high school students’ code comprehension in a physical computing con text, and, (b) it highlights the mediating role of physical artifacts in program comprehension. Further, they point directions for future pedagogical and tool designs within physical computing to better support students’ distributed program comprehension. 
    more » « less
  5. Often, security topics are only taught in advanced computer science (CS) courses. However, most US R1 universities do not require students to take these courses to complete an undergraduate CS degree. As a result, students can graduate without learning about computer security and secure programming practices. To gauge students’ knowledge and skills of secure programming, we conducted a coding interview with 21 students from two R1 universities in the United States. All the students in our study had at least taken Computer Systems or an equivalent course. We then analyzed the students’ approach to safe programming practices, such as avoiding unsafe functions like gets and strcpy, and basic security knowledge, such as writing code that assumes user inputs can be malicious. Our results suggest that students lack the key fundamental skills to write secure programs. For example, students rarely pay attention to details, such as compiler warnings, and often do not read programming language documentation with care. Moreover, some students’ understanding of memory layout is cursory, which is crucial for writing secure programs. We also found that some students are struggling with even the basics of C programming, even though it is the main language taught in Computer Systems courses. 
    more » « less