skip to main content

Search for: All records

Creators/Authors contains: "Zilles, Craig"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Dorn, Brian ; Vahrenhold, Jan (Ed.)
    Background and Context Lopez and Lister first presented evidence for a skill hierarchy of code reading, tracing, and writing for introductory programming students. Further support for this hierarchy could help computer science educators sequence course content to best build student programming skill. Objective This study aims to replicate a slightly simplified hierarchy of skills in CS1 using a larger body of students (600+ vs. 38) in a non-major introductory Python course with computer-based exams. We also explore the validity of other possible hierarchies. Method We collected student score data on 4 kinds of exam questions. Structural equation modeling was used to derive the hierarchy for each exam. Findings We find multiple best-fitting structural models. The original hierarchy does not appear among the “best” candidates, but similar models do. We also determined that our methods provide us with correlations between skills and do not answer a more fundamental question: what is the ideal teaching order for these skills? Implications This modeling work is valuable for understanding the possible correlations between fundamental code-related skills. However, analyzing student performance on these skills at a moment in time is not sufficient to determine teaching order. We present possible study designs for exploring this moremore »actionable research question.« less
    Free, publicly-accessible full text available June 1, 2023
  2. This full research paper explores students’ attitudes toward second-chance testing and how second-chance testing influences students’ behavior. Second-chance testing refers to giving students the opportunity to take a second instance of each exam for some sort of grade replacement. Previous work has demonstrated that second-chance testing can lead to improved student outcomes in courses, but how to best structure second-chance testing to maximize its benefits remains an open question. We complement previous work by interviewing a diverse group of 23 students that have taken courses that use second-chance testing. From the interviews, we sought to gain insight into students’ views and use of second-chance testing. We found that second-chance testing was almost universally viewed positively by the students and was frequently cited as helping to reduce test takers’ anxiety and boost their confidence. Overall, we find that the majority of students prepare for second-chance exams in desirable ways, but we also note ways in which second-chance testing can potentially lead to undesirable behaviors including procrastination, over-reliance on memorization, and attempts to game the system. We identified emergent themes pertaining to various facets of second-chance test-taking, including: 1) concerns about the time commitment required for second-chance exams; 2) a belief thatmore »second-chance exams promoted fairness; and 3) how second-chance testing incentivized learning. This paper will provide instructors and other stakeholders with detailed insights into students’ behavior regarding second-chance testing, enabling instructors to develop better policies and avoid unintended consequences.« less
  3. Using multiple versions of exams is a common exam security technique to prevent cheating in a variety of contexts. While psycho-metric techniques are routinely used by large high-stakes testing companies to ensure equivalence between exam versions, such approaches are generally cost and effort prohibitive for individual classrooms. As such, exam versions practically present a tension between exam security (which is enhanced by the versioning) and fairness (which results from difficulty variation between versions). In this work, we surveyed students on their perceptions of this trade-off between exam security and fairness on a versioned programming exam and found that significant populations value each aspect over the other. Furthermore, we found that students' expression of concerns about unfairness was not correlated to whether they had received harder versions of the course's most recent exam, but was correlated to lower overall course performance.
  4. In this paper, we study a computerized exam system that allows students to attempt the same question multiple times. This system permits students either to receive feedback on their submitted answer immediately or to defer the feedback and grade questions in bulk. An analysis of student behavior in three courses across two semesters found similar student behaviors across courses and student groups. We found that only a small minority of students used the deferred feedback option. A clustering analysis that considered both when students chose to receive feedback and either to immediately retry incorrect problems or to attempt other unfinished problems identified four main student strategies. These strategies were correlated to statistically significant differences in exam scores, but it was not clear if some strategies improved outcomes or if stronger students tended to prefer certain strategies.
  5. Proctoring educational assessments (e.g., quizzes and exams) has a cost, be it in faculty (and/or course staff) time or in money to pay for proctoring services. Previous estimates of the utility of proctoring (generally by estimating the score advantage of taking an exam without proctoring) vary widely and have mostly been implemented using an across subjects experimental designs and sometimes with low statistical power. We investigated the score advantage of unproctored exams versus proctored exams using a within-subjects design for N = 510 students in an on-campus introductory programming course with 5 proctored exams and 4 unproctored exams. We found that students scored 3.32 percentage points higher on questions on unproctored exams than on proctored exams (p < 0.001). More interestingly, however, we discovered that this score advantage on unproctored exams grew steadily as the semester progressed, from around 0 percentage points at the start of semester to around 7 percentage points by the end. As the most obvious explanation for this advantage is cheating, we refer to this behavior as the student population "learning to cheat". The data suggests that both more individuals are cheating and the average benefit of cheating is increasing over the course of the semester.more »Furthermore, we observed that studying for unproctored exams decreased over the course of the semester while studying for proctored exams stayed constant. Lastly, we estimated the score advantage by question type and found that our long-form programming questions had the highest score advantage on unproctored exams, but there are multiple possible explanations for this finding.« less
  6. We explore how course policies affect students' studying and learning when a second-chance exam is offered. High-stakes, one-off exams remain a de facto standard for assessing student knowledge in STEM, despite compelling evidence that other assessment paradigms such as mastery learning can improve student learning. Unfortunately, mastery learning can be costly to implement. We explore the use of optional second-chance testing to sustainably reap the benefits of mastery-based learning at scale. Prior work has shown that course policies affect students' studying and learning but have not compared these effects within the same course context. We conducted a quasi-experimental study in a single course to compare the effect of two grading policies for second-chance exams and the effect of increasing the size of the range of dates for students taking asynchronous exams. The first grading policy, called 90-cap, allowed students to optionally take a second-chance exam that would fully replace their score on a first-chance exam except the second-chance exam would be capped at 90% credit. The second grading policy, called 90-10, combined students' first- and second-chance exam scores as a weighted average (90% max score + 10% min score). The 90-10 policy significantly increased the likelihood that marginally competent studentsmore »would take the second-chance exam. Further, our data suggests that students learned more under the 90-10 policy, providing improved student learning outcomes at no cost to the instructor. Most students took exams on the last day an exam was available, regardless of how many days the exam was available.« less
  7. We describe the deployment of an imperfect NLP-based automatic short answer grading system on an exam in a large-enrollment introductory college course. We characterize this deployment as both high stakes (the questions were on an mid-term exam worth 10% of students’ final grade) and high transparency (the question was graded interactively during the computer-based exam and correct solutions were shown to students that could be compared to their answer). We study two techniques designed to mitigate the potential student dissatisfaction resulting from students incorrectly not granted credit by the imperfect AI grader. We find (1) that providing multiple attempts can eliminate first-attempt false negatives at the cost of additional false positives, and (2) that students not granted credit from the algorithm cannot reliably determine if their answer was mis-scored.