skip to main content


Search for: All records

Creators/Authors contains: "Zilles, Craig"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Background and context. “Explain in Plain English” (EiPE) questions ask students to explain the high-level purpose of code, requiring them to understand the macrostructure of the program’s intent. A lot is known about techniques that experts use to comprehend code, but less is known about how we should teach novices to develop this capability. Objective. Identify techniques that can be taught to students to assist them in developing their ability to comprehend code and contribute to the body of knowledge of how novices develop their code comprehension skills. Method. We developed interventions that could be taught to novices motivated by previous research about how experts comprehend code: prompting students to identify beacons, identify the role of variables, tracing, and abstract tracing. We conducted think-aloud interviews of introductory programming students solving EiPE questions, varying which interventions each student was taught. Some participants were interviewed multiple times throughout the semester to observe any changes in behavior over time. Findings. Identifying beacons and the name of variable roles were rarely helpful, as they did not encourage students to integrate their understanding of that piece in relation to other lines of code. However, prompting students to explain each variable’s purpose helped them focus on useful subsets of the code, which helped manage cognitive load. Tracing was helpful when students incorrectly recognized common programming patterns or made mistakes comprehending syntax (text-surface). Prompting students to pick inputs that potentially contradicted their current understanding of the code was found to be a simple approach to them effectively selecting inputs to trace. Abstract tracing helped students see high-level, functional relationships between variables. In addition, we observed student spontaneously sketching algorithmic visualizations that similarly helped them see relationships between variables. Implications. Because students can get stuck at many points in the process of code comprehension, there seems to be no silver bullet technique that helps in every circumstance. Instead, effective instruction for code comprehension will likely involve teaching a collection of techniques. In addition to these techniques, meta-knowledge about when to apply each technique will need to be learned, but that is left for future research. At present, we recommend teaching a bottom-up, concrete-to-abstract approach. 
    more » « less
    Free, publicly-accessible full text available August 7, 2024
  2. Errors in AI grading and feedback are by their nature non-deterministic and difficult to completely avoid. Since inaccurate feedback potentially harms learning, there is a need for designs and workflows that mitigate these harms. To better understand the mechanisms by which erroneous AI feedback impacts students’ learning, we conducted surveys and interviews that recorded students’ interactions with a short-answer AI autograder for ``Explain in Plain English'' code reading problems. Using causal modeling, we inferred the learning impacts of wrong answers marked as right (false positives, FPs) and right answers marked as wrong (false negatives, FNs). We further explored explanations for the learning impacts, including errors influencing participants’ engagement with feedback and assessments of their answers’ correctness, and participants’ prior performance in the class. FPs harmed learning in large part due to participants’ failures to detect the errors. This was due to participants not paying attention to the feedback after being marked as right, and an apparent bias against admitting one’s answer was wrong once marked right. On the other hand, FNs harmed learning only for survey participants, suggesting that interviewees’ greater behavioral and cognitive engagement protected them from learning harms. Based on these findings, we propose ways to help learners detect FPs and encourage deeper reflection on FNs to mitigate learning harms of AI errors. 
    more » « less
    Free, publicly-accessible full text available August 7, 2024
  3. This full research paper explores how second-chance testing can be used as a strategy for mitigating students’ test anxiety in STEM courses, thereby boosting students’ performance and experiences. Second-chance testing is a testing strategy where students are given an opportunity to take an assessment twice. We conducted a mixed-methods study to explore second-chance testing as a potential solution to test anxiety. First, we interviewed a diverse group of STEM students (N = 23) who had taken courses with second-chance testing to ask about the stress and anxiety associated with testing. We then administered a survey on test anxiety to STEM students in seven courses that offered second-chance tests at Midwestern University (N = 448). We found that second-chance testing led to a 30% reduction in students’ reported test anxiety. Students also reported reduced stress throughout the semester, even outside of testing windows, due to the availability of second-chance testing. Our study included an assortment of STEM courses where second-chance testing was deployed, which indicates that second-chance testing is a viable strategy for reducing anxiety in a variety of contexts. We also explored whether the resultant reduction in test anxiety led to student complacency, encouraged procrastination, or other suboptimal student behavior because of the extra chance provided. We found that the majority of students reported that they worked hard on their initial test attempts even when second-chance testing was available. 
    more » « less
    Free, publicly-accessible full text available June 26, 2024
  4. In this full research paper, we examine various grading policies for second-chance testing. Second-chance testing refers to giving students the opportunity to take a second version of a test for some form of grade replacement. Second-chance testing as a pedagogical strategy bears some similarities to mastery learning, but second-chance testing is less expensive to implement. Previous work has shown that second-chance testing is associated with improved performance, but there is still a lack of clarity regarding the optimal grading policies for this testing strategy. We interviewed seven instructors who use second-chance testing in their courses to collect data on why they chose specific policies. We then conducted structured interviews with some students (N = 11) to capture more nuance about students’ decision making processes under the different grading policies. Afterwards, we conducted a quasi-experimental study to compare two second-chance testing grading policies and determine how they influenced students across multiple dimensions. We varied the grading policies used in two similar sophomore-level engineering courses. We collected assessment data and administered a survey that queried students (N = 513) about their behavior and reactions to both grading policies. Surprisingly, we found that the students’ preference between these two policies were almost perfectly split. We conclude that there are likely many policies that perform well by being simple and encouraging serious attempts on both tests. 
    more » « less
    Free, publicly-accessible full text available June 26, 2024
  5. Explain in Plain English (EiPE) questions evaluate whether students can understand and explain the high-level purpose of code. We conducted a qualitative think-aloud study of introductory programming students solving EiPE questions. In this paper, we focus on how students use tracing (mental execution) to understand code in order to explain it. We found that, in some cases, tracing can be an effective strategy for novices to understand and explain code. Furthermore, we observed three problems that prevented tracing from being helpful, which are 1) not employing tracing when it could be helpful (some struggling students explained correctly after the interviewer suggested tracing the code), 2) tracing incorrectly due to misunderstandings of the programming language, and 3) tracing with a set of inputs that did not sufficiently expose the code’s behavior (upon interviewer suggesting inputs, students explained correctly). These results suggest that we should teach students to use tracing as a method for understanding code and teach them how to select appropriate inputs to trace. 
    more » « less
  6. In technical writing, certain statements must be written very carefully in order to clearly and precisely communicate an idea. Students are often asked to write these statements in response to an open- ended prompt, making them difficult to auto-grade with traditional methods. We present what we believe to be a novel approach for auto-grading these statements by restricting students’ submissions to a pre-defined context-free grammar (configured by the instructor). In addition, our tool provides instantaneous feedback that helps students improve their writing, and it scaffolds the process of constructing a statement by reducing the number of choices students have to make compared to free-form writing. We evaluated our tool by deploying it on an assignment in an undergraduate algorithms course. The assignment contained five questions that used the tool, preceded by a pre-test and followed by a post-test. We observed a statistically significant improvement from the pre-test to the post-test, with the mean score increasing from 7.2/12 to 9.2/12. 
    more » « less
  7. We conducted an across-semester quasi-experimental study that compared students' outcomes under frequent and infrequent testing regimens in an introductory computer science course. Students in the frequent testing (4 quizzes and 4 exams) semester outperformed the infrequent testing (1 midterm and 1 final exam) semester by 9.1 to 13.5 percentage points on code writing questions. We complement these performance results with additional data from surveys, interviews, and analysis of textbook behavior. In the surveys, students report a preference for the smaller number of exams, but rated the exams in the frequent testing semester to be both less difficult and less stressful, in spite of the exams containing identical content. In the interviews, students predominantly indicated (1) that the frequent testing regimen encourages better study habits (e.g., more attention to work, less cramming) and leads to better learning, (2) that frequent testing reduces test anxiety, and (3) that the frequent testing regimen was more fair, but these opinions were not universally held. The students' impressions that the frequent testing regimen would lead to better study habits is borne out in our analysis of students' activities in the course's interactive textbook. In the frequent testing semester, students spent more time on textbook readings and appeared to answer textbook questions more earnestly (i.e., less "gaming the system'' by using hints and brute force). 
    more » « less
  8. Dorn, Brian ; Vahrenhold, Jan (Ed.)
    Background and Context Lopez and Lister first presented evidence for a skill hierarchy of code reading, tracing, and writing for introductory programming students. Further support for this hierarchy could help computer science educators sequence course content to best build student programming skill. Objective This study aims to replicate a slightly simplified hierarchy of skills in CS1 using a larger body of students (600+ vs. 38) in a non-major introductory Python course with computer-based exams. We also explore the validity of other possible hierarchies. Method We collected student score data on 4 kinds of exam questions. Structural equation modeling was used to derive the hierarchy for each exam. Findings We find multiple best-fitting structural models. The original hierarchy does not appear among the “best” candidates, but similar models do. We also determined that our methods provide us with correlations between skills and do not answer a more fundamental question: what is the ideal teaching order for these skills? Implications This modeling work is valuable for understanding the possible correlations between fundamental code-related skills. However, analyzing student performance on these skills at a moment in time is not sufficient to determine teaching order. We present possible study designs for exploring this more actionable research question. 
    more » « less
  9. This full research paper explores students’ attitudes toward second-chance testing and how second-chance testing influences students’ behavior. Second-chance testing refers to giving students the opportunity to take a second instance of each exam for some sort of grade replacement. Previous work has demonstrated that second-chance testing can lead to improved student outcomes in courses, but how to best structure second-chance testing to maximize its benefits remains an open question. We complement previous work by interviewing a diverse group of 23 students that have taken courses that use second-chance testing. From the interviews, we sought to gain insight into students’ views and use of second-chance testing. We found that second-chance testing was almost universally viewed positively by the students and was frequently cited as helping to reduce test takers’ anxiety and boost their confidence. Overall, we find that the majority of students prepare for second-chance exams in desirable ways, but we also note ways in which second-chance testing can potentially lead to undesirable behaviors including procrastination, over-reliance on memorization, and attempts to game the system. We identified emergent themes pertaining to various facets of second-chance test-taking, including: 1) concerns about the time commitment required for second-chance exams; 2) a belief that second-chance exams promoted fairness; and 3) how second-chance testing incentivized learning. This paper will provide instructors and other stakeholders with detailed insights into students’ behavior regarding second-chance testing, enabling instructors to develop better policies and avoid unintended consequences. 
    more » « less
  10. null (Ed.)
    Using multiple versions of exams is a common exam security technique to prevent cheating in a variety of contexts. While psycho-metric techniques are routinely used by large high-stakes testing companies to ensure equivalence between exam versions, such approaches are generally cost and effort prohibitive for individual classrooms. As such, exam versions practically present a tension between exam security (which is enhanced by the versioning) and fairness (which results from difficulty variation between versions). In this work, we surveyed students on their perceptions of this trade-off between exam security and fairness on a versioned programming exam and found that significant populations value each aspect over the other. Furthermore, we found that students' expression of concerns about unfairness was not correlated to whether they had received harder versions of the course's most recent exam, but was correlated to lower overall course performance. 
    more » « less