skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on January 1, 2026

Title: Auto-Grader Feedback Utilization and Its Impacts: An Observational Study Across Five Community Colleges
Automated grading systems, or auto-graders, have become ubiquitous in programming education, and the way they generate feedback has become increasingly automated as well. However, there is insufficient evidence regarding auto-grader feedback’s effectiveness in improving student learning outcomes, in a way that differentiates students who utilized the feedback and students who did not. In this study, we fill this critical gap. Specifically, we analyze students’ interactions with auto-graders in an introductory Python programming course, offered at five community colleges in the United States. Our results show that students checking the feedback more frequently tend to get higher scores from their programming assignments overall. Our results also show that a submission that follows a student checking the feedback tends to receive a higher score than a submission that follows a student ignoring the feedback. Our results provide evidence on auto-grader feedback’s effectiveness, encourage their increased utilization, and call for future work to continue their evaluation in this age of automation.  more » « less
Award ID(s):
2111305
PAR ID:
10620958
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
SCITEPRESS - Science and Technology Publications
Date Published:
ISBN:
978-989-758-746-7
Page Range / eLocation ID:
356 to 363
Format(s):
Medium: X
Location:
Porto, Portugal
Sponsoring Org:
National Science Foundation
More Like this
  1. Benjamin, Paaßen; Carrie, Demmans Epp (Ed.)
    In feedback generation for logical errors in programming assignments, large language model (LLM)-based methods have shown great promise. These methods ask the LLM to generate feedback given the problem statement and a student¿½fs (buggy) submission. There are several issues with these types of methods. First, the generated feedback messages are often too direct in revealing the error in the submission and thus diminish valuable opportunities for the student to learn. Second, they do not consider the student¿½fs learning context, i.e., their previous submissions, current knowledge, etc. Third, they are not layered since existing methods use a single, shared prompt for all student submissions. In this paper, we explore using LLMs to generate a ``feedback-ladder'', i.e., multiple levels of feedback for the same problem-submission pair. We evaluate the quality of the generated feedback-ladder via a user study with students, educators, and researchers. We have observed diminishing effectiveness for higher-level feedback and higher-scoring submissions overall in the study. In practice, our method enables teachers to select an appropriate level of feedback to show to a student based on their personal learning context, or in a progressive manner to go more detailed if a higher-level feedback fails to correct the student¿½fs error. 
    more » « less
  2. Existing techniques for automating the testing of sequential programming assignments are fundamentally at odds with concurrent programming as they are oblivious to the algorithm used to implement the assignments. We have developed a framework that addresses this limitation for those object-based concurrent assignments whose user-interface (a) is implemented using the observer pattern and (b) makes apparent whether concurrency requirements are met. It has two components. The first component reduces the number of steps a human grader needs to take to interact with and score the user-interfaces of the submitted programs. The second component completely automates assessment by observing the events sent by the student-implemented observable objects. Both components are used to score the final submission and log interaction. The second component is also used to provide feedback during assignment implementation. Our experience shows that the framework is used extensively by students, leads to more partial credit, reduces grading time, and gives statistics about incremental student progress 
    more » « less
  3. null (Ed.)
    The feedback provided by current testing education tools about the deficiencies in a student’s test suite either mimics industry code coverage tools or lists specific instructor test cases that are missing from the student’s test suite. While useful in some sense, these types of feedback are akin to revealing the solution to the problem, which can inadvertently encourage students to pursue a trial-and-error approach to testing, rather than using a more systematic approach that encourages learning. In addition to not teaching students why their test suite is inadequate, this type of feedback may motivate students to become dependent on the feedback rather than thinking for themselves. To address this deficiency, there is an opportunity to investigate alternative feedback mechanisms that include a positive reinforcement of testing concepts. We argue that using an inquiry-based learning approach is better than simply providing the answers. To facilitate this type of learning, we present Testing Tutor, a web-based assignment submission platform that supports different levels of testing pedagogy via a customizable feedback engine. We evaluated the impact of the different types of feedback through an empirical study in two sophomore-level courses.We use Testing Tutor to provide students with different types of feedback, either traditional detailed code coverage feedback or inquiry-based learning conceptual feedback, and compare the effects. The results show that students that receive conceptual feedback had higher code coverage (by different measures), fewer redundant test cases, and higher programming grades than the students who receive traditional code coverage feedback. 
    more » « less
  4. Akram, Bita; Shi, Yang; Brusilovsky, Peter; I-han Hsiao, Sharon; Leinonen, Juho (Ed.)
    Promptly addressing students’ help requests on their programming assignments has become more and more challenging in computer science education. Since the pandemic, most instructors use online office hours to answer questions. Prior studies have shown increased student participation with online office hours. This popularity has led to significantly longer wait times in the office hours queue, and various strategies for selecting the next student to help may impact wait time. For example, prioritizing students who have not been seen on the day of the deadline will extend the wait time for students who are frequently rejoining the queue. To better understand this problem, we explored students’ behavior when they are waiting in the queue. We investigate the amount of time students are willing to wait in the queue by modeling the distribution of cancellation time. We find that after waiting for 49 minutes, most students will cancel their help request. Then, we looked at students’ coding actions during the waiting period and found that only 21% of students have commits while waiting. Surprisingly, students who waited for hours did not commit their work for automated feedback. Our findings suggest that time in the queue should be considered in addition to other factors like last interaction when selecting the next student to help during office hours to minimize canceled interactions. 
    more » « less
  5. Dziri, Nouha; Ren, Sean; Diao, Shizhe (Ed.)
    The ability to revise essays in response to feedback is important for students’ writing success. An automated writing evaluation (AWE) system that supports students in revising their essays is thus essential. We present eRevise+RF, an enhanced AWE system for assessing student essay revisions (e.g., changes made to an essay to improve its quality in response to essay feedback) and providing revision feedback. We deployed the system with 6 teachers and 406 students across 3 schools in Pennsylvania and Louisiana. The results confirmed its effectiveness in (1) assessing student essays in terms of evidence usage, (2) extracting evidence and reasoning revisions across essays, and (3) determining revision success in responding to feedback. The evaluation also suggested eRevise+RF is a helpful system for young students to improve their argumentative writing skills through revision and formative feedback. 
    more » « less