This research explores a novel human-in-the-loop approach that goes beyond traditional prompt engineering approaches to harness Large Language Models (LLMs) with chain-of-thought prompting for grading middle school students’ short answer formative assessments in science and generating useful feedback. While recent efforts have successfully applied LLMs and generative AI to automatically grade assignments in secondary classrooms, the focus has primarily been on providing scores for mathematical and programming problems with little work targeting the generation of actionable insight from the student responses. This paper addresses these limitations by exploring a human-in-the-loop approach to make the process more intuitive and more effective. By incorporating the expertise of educators, this approach seeks to bridge the gap between automated assessment and meaningful educational support in the context of science education for middle school students. We have conducted a preliminary user study, which suggests that (1) co-created models improve the performance of formative feedback generation, and (2) educator insight can be integrated at multiple steps in the process to inform what goes into the model and what comes out. Our findings suggest that in-context learning and human-in-the-loop approaches may provide a scalable approach to automated grading, where the performance of the automated LLM-based grader continually improves over time, while also providing actionable feedback that can support students’ open-ended science learning.
more »
« less
Designing and implementing an automated grading workflow for providing personalized feedback to open-ended data science assignments
Open-ended assignments - such as lab reports and semester-long projects - provide data science and statistics students with opportunities for developing communication, critical thinking, and creativity skills. However, providing grades and qualitative feedback to open-ended assignments can be very time consuming and difficult to do consistently across students. In this paper, we discuss the steps of a typical grading workflow and highlight which steps can be automated in an approach that we define as an automated grading workflow. We illustrate how gradetools, a new R package, implements this approach within RStudio to facilitate efficient and consistent grading while providing individualized feedback. We hope that this work will help the community of data science and statistics educators use gradetools as their grading workflow assistant or develop their own tools for assisting their grading workflow.
more »
« less
- Award ID(s):
- 2123366
- PAR ID:
- 10536911
- Publisher / Repository:
- UC Santa Barbara
- Date Published:
- Journal Name:
- Technology Innovations in Statistics Education
- Volume:
- 15
- Issue:
- 1
- ISSN:
- 1933-4214
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In schools and colleges around the world, open-ended home-work assignments are commonly used. However, such assignments require substantial instructor effort for grading, and tend not to support opportunities for repeated practice. We propose UpGrade, a novel learnersourcing approach that generates scalable learning opportunities using prior student solutions to open-ended problems. UpGrade creates interactive questions that offer automated and real-time feedback, while enabling repeated practice. In a two-week experiment in a college-level HCI course, students answering UpGrade-created questions instead of traditional open-ended assignments achieved indistinguishable learning outcomes in ~30% less time. Further, no manual grading effort is required. To enhance quality control, UpGrade incorporates a psychometric approach using crowd workers' answers to automatically prune out low quality questions, resulting in a question bank that exceeds reliability standards for classroom use.more » « less
-
Martin Fred; Norouzi, Narges; Rosenthal, Stephanie (Ed.)This paper examines the use of LLMs to support the grading and explanation of short-answer formative assessments in K12 science topics. While significant work has been done on programmatically scoring well-structured student assessments in math and computer science, many of these approaches produce a numerical score and stop short of providing teachers and students with explanations for the assigned scores. In this paper, we investigate few-shot, in-context learning with chain-of-thought reasoning and active learning using GPT-4 for automated assessment of students’ answers in a middle school Earth Science curriculum. Our findings from this human-in-the-loop approach demonstrate success in scoring formative assessment responses and in providing meaningful explanations for the assigned score. We then perform a systematic analysis of the advantages and limitations of our approach. This research provides insight into how we can use human-in-the-loop methods for the continual improvement of automated grading for open-ended science assessments.more » « less
-
This innovative practice WIP paper describes our ongoing development and deployment of an online robotics education platform that highlighted a gap in providing an interactive, feedback-rich learning environment essential for mastering pro-gramming concepts in robotics, which they were not getting with the traditional code→ simulate→turn-in workflow. Since teaching resources are limited, students would benefit from feedback in real-time to find and fix their mistakes in the programming assignments. To integrate such automated feedback, this paper will focus on creating a system for unit testing while integrating it into the course workflow. We facilitate this real-time feedback by including unit testing in the design of programming assignments so students can understand and fix their errors on their own and without the prior help of instructors/TAs serving as a bottleneck. In line with the framework's personalized student-centered approach, this method makes it easier for students to revise and debug their programming work, encouraging hands-on learning. The updated course workflow, which includes unit tests, will strengthen the learning environment and make it more interactive so that students can learn how to program robots in a self-guided fashion.more » « less
-
It is challenging to effectively educate in large classes with students from a multitude of backgrounds. Many introductory engineering courses in universities have hundreds of students, and some online classes are even larger. Instructors in these circumstances often turn to online homework systems, which help greatly reduce the grading burden; however, they come at the cost of reducing the quality of feedback that students receive. Since online systems typically can only automatically grade multiple choice or numeric answer questions, students predominately do not receive feedback on the critical skill of sketching free-body diagrams (FBD). An online, sketch-recognition based tutoring system called Mechanix requires students to draw free-body diagrams for introductory statics courses in addition to grading their final answers. Students receive feedback about their diagrams that would otherwise be difficult for instructors to provide in large classes. Additionally, Mechanix can grade open-ended truss design problems with an indeterminate number of solutions. Mechanix has been in use for over six semesters at five different universities by over 1000 students to study its effectiveness. Students used Mechanix for one to three homework assignments covering free-body diagrams, static truss analysis, and truss design for an open-ended problem. Preliminary results suggest the system increases homework engagement and effort for students who are struggling and is as effective as other homework systems for teaching statics. Focus groups showed students enjoyed using Mechanix and that it helped their learning process.more » « less
An official website of the United States government

