skip to main content


Search for: All records

Creators/Authors contains: "Fowler, Max"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Errors in AI grading and feedback are by their nature non-deterministic and difficult to completely avoid. Since inaccurate feedback potentially harms learning, there is a need for designs and workflows that mitigate these harms. To better understand the mechanisms by which erroneous AI feedback impacts students’ learning, we conducted surveys and interviews that recorded students’ interactions with a short-answer AI autograder for ``Explain in Plain English'' code reading problems. Using causal modeling, we inferred the learning impacts of wrong answers marked as right (false positives, FPs) and right answers marked as wrong (false negatives, FNs). We further explored explanations for the learning impacts, including errors influencing participants’ engagement with feedback and assessments of their answers’ correctness, and participants’ prior performance in the class. FPs harmed learning in large part due to participants’ failures to detect the errors. This was due to participants not paying attention to the feedback after being marked as right, and an apparent bias against admitting one’s answer was wrong once marked right. On the other hand, FNs harmed learning only for survey participants, suggesting that interviewees’ greater behavioral and cognitive engagement protected them from learning harms. Based on these findings, we propose ways to help learners detect FPs and encourage deeper reflection on FNs to mitigate learning harms of AI errors. 
    more » « less
    Free, publicly-accessible full text available August 7, 2024
  2. We conducted an across-semester quasi-experimental study that compared students' outcomes under frequent and infrequent testing regimens in an introductory computer science course. Students in the frequent testing (4 quizzes and 4 exams) semester outperformed the infrequent testing (1 midterm and 1 final exam) semester by 9.1 to 13.5 percentage points on code writing questions. We complement these performance results with additional data from surveys, interviews, and analysis of textbook behavior. In the surveys, students report a preference for the smaller number of exams, but rated the exams in the frequent testing semester to be both less difficult and less stressful, in spite of the exams containing identical content. In the interviews, students predominantly indicated (1) that the frequent testing regimen encourages better study habits (e.g., more attention to work, less cramming) and leads to better learning, (2) that frequent testing reduces test anxiety, and (3) that the frequent testing regimen was more fair, but these opinions were not universally held. The students' impressions that the frequent testing regimen would lead to better study habits is borne out in our analysis of students' activities in the course's interactive textbook. In the frequent testing semester, students spent more time on textbook readings and appeared to answer textbook questions more earnestly (i.e., less "gaming the system'' by using hints and brute force). 
    more » « less
  3. Dorn, Brian ; Vahrenhold, Jan (Ed.)
    Background and Context Lopez and Lister first presented evidence for a skill hierarchy of code reading, tracing, and writing for introductory programming students. Further support for this hierarchy could help computer science educators sequence course content to best build student programming skill. Objective This study aims to replicate a slightly simplified hierarchy of skills in CS1 using a larger body of students (600+ vs. 38) in a non-major introductory Python course with computer-based exams. We also explore the validity of other possible hierarchies. Method We collected student score data on 4 kinds of exam questions. Structural equation modeling was used to derive the hierarchy for each exam. Findings We find multiple best-fitting structural models. The original hierarchy does not appear among the “best” candidates, but similar models do. We also determined that our methods provide us with correlations between skills and do not answer a more fundamental question: what is the ideal teaching order for these skills? Implications This modeling work is valuable for understanding the possible correlations between fundamental code-related skills. However, analyzing student performance on these skills at a moment in time is not sufficient to determine teaching order. We present possible study designs for exploring this more actionable research question. 
    more » « less
  4. null (Ed.)
    Proctoring educational assessments (e.g., quizzes and exams) has a cost, be it in faculty (and/or course staff) time or in money to pay for proctoring services. Previous estimates of the utility of proctoring (generally by estimating the score advantage of taking an exam without proctoring) vary widely and have mostly been implemented using an across subjects experimental designs and sometimes with low statistical power. We investigated the score advantage of unproctored exams versus proctored exams using a within-subjects design for N = 510 students in an on-campus introductory programming course with 5 proctored exams and 4 unproctored exams. We found that students scored 3.32 percentage points higher on questions on unproctored exams than on proctored exams (p < 0.001). More interestingly, however, we discovered that this score advantage on unproctored exams grew steadily as the semester progressed, from around 0 percentage points at the start of semester to around 7 percentage points by the end. As the most obvious explanation for this advantage is cheating, we refer to this behavior as the student population "learning to cheat". The data suggests that both more individuals are cheating and the average benefit of cheating is increasing over the course of the semester. Furthermore, we observed that studying for unproctored exams decreased over the course of the semester while studying for proctored exams stayed constant. Lastly, we estimated the score advantage by question type and found that our long-form programming questions had the highest score advantage on unproctored exams, but there are multiple possible explanations for this finding. 
    more » « less