skip to main content


Title: Towards a formative feedback generation agent: Leveraging a human-in-the-loop, chain-of-thought prompting approach with LLMs to evaluate formative assessment responses in K-12 science.
This research explores a novel human-in-the-loop approach that goes beyond traditional prompt engineering approaches to harness Large Language Models (LLMs) with chain-of-thought prompting for grading middle school students’ short answer formative assessments in science and generating useful feedback. While recent efforts have successfully applied LLMs and generative AI to automatically grade assignments in secondary classrooms, the focus has primarily been on providing scores for mathematical and programming problems with little work targeting the generation of actionable insight from the student responses. This paper addresses these limitations by exploring a human-in-the-loop approach to make the process more intuitive and more effective. By incorporating the expertise of educators, this approach seeks to bridge the gap between automated assessment and meaningful educational support in the context of science education for middle school students. We have conducted a preliminary user study, which suggests that (1) co-created models improve the performance of formative feedback generation, and (2) educator insight can be integrated at multiple steps in the process to inform what goes into the model and what comes out. Our findings suggest that in-context learning and human-in-the-loop approaches may provide a scalable approach to automated grading, where the performance of the automated LLM-based grader continually improves over time, while also providing actionable feedback that can support students’ open-ended science learning.  more » « less
Award ID(s):
2017000
PAR ID:
10468997
Author(s) / Creator(s):
; ;
Publisher / Repository:
dl.acm.org
Date Published:
Subject(s) / Keyword(s):
Education LLM formative feedback STEM learning
Format(s):
Medium: X
Location:
iui.acm.org
Sponsoring Org:
National Science Foundation
More Like this
  1. Martin Fred ; Norouzi, Narges ; Rosenthal, Stephanie (Ed.)
    This paper examines the use of LLMs to support the grading and explanation of short-answer formative assessments in K12 science topics. While significant work has been done on programmatically scoring well-structured student assessments in math and computer science, many of these approaches produce a numerical score and stop short of providing teachers and students with explanations for the assigned scores. In this paper, we investigate few-shot, in-context learning with chain-of-thought reasoning and active learning using GPT-4 for automated assessment of students’ answers in a middle school Earth Science curriculum. Our findings from this human-in-the-loop approach demonstrate success in scoring formative assessment responses and in providing meaningful explanations for the assigned score. We then perform a systematic analysis of the advantages and limitations of our approach. This research provides insight into how we can use human-in-the-loop methods for the continual improvement of automated grading for open-ended science assessments. 
    more » « less
  2. This paper explores the use of large language models (LLMs) to score and explain short-answer assessments in K-12 science. While existing methods can score more structured math and computer science assessments, they often do not provide explanations for the scores. Our study focuses on employing GPT-4 for automated assessment in middle school Earth Science, combining few-shot and active learning with chain-of-thought reasoning. Using a human-in-the-loop approach, we successfully score and provide meaningful explanations for formative assessment responses. A systematic analysis of our method's pros and cons sheds light on the potential for human-in-the-loop techniques to enhance automated grading for open-ended science assessments.

     
    more » « less
  3. The Next Generation Science Standards and the National Research Council recognize systems thinking as an essential skill to address the global challenges of the 21st century. But the habits of mind needed to understand complex systems are not readily learned through traditional approaches. Recently large-scale interactive multi-user immersive simulations are being used to expose the learners to diverse topics that emulate real-world complex systems phenomena. These modern-day mixed reality simulations are unique in that the learners are an integral part of the evolving dynamics. The decisions they make and the actions that follow, collectively impact the simulated complex system, much like any real-world complex system. But the learners have difficulty understanding these coupled complex systems processes, and often get “lost” or “stuck,” and need help navigating the problem space. Formative feedback is the traditional way educators support learners during problem solving. Traditional goal-based and learner-centered approaches don’t scale well to environments that allow learners to explore multiple goals or solutions, and multiple solution paths (Mallavarapu & Lyons, 2020). In this work, we reconceptualize formative feedback for complex systems-based learning environments, formative fugues, (a term derived from music by Reitman, 1964) to allow learners to make informed decisions about their own exploration paths. We discuss a novel computational approach that employs causal inference and pattern matching to characterize the exploration paths of prior learners and generate situationally relevant formative feedback. We extract formative fugues from the data collected from an ecological complex systems simulation installed at a museum. The extracted feedback does not presume the goals of the learners, but helps the learners understand what choices and events led to the current state of the problem space, and what paths forward are possible. We conclude with a discussion of implications of using formative fugues for complex systems education. 
    more » « less
  4. Open-ended assignments - such as lab reports and semester-long projects - provide data science and statistics students with opportunities for developing communication, critical thinking, and creativity skills. However, providing grades and qualitative feedback to open-ended assignments can be very time consuming and difficult to do consistently across students. In this paper, we discuss the steps of a typical grading workflow and highlight which steps can be automated in an approach that we define as an automated grading workflow. We illustrate how gradetools, a new R package, implements this approach within RStudio to facilitate efficient and consistent grading while providing individualized feedback. We hope that this work will help the community of data science and statistics educators use gradetools as their grading workflow assistant or develop their own tools for assisting their grading workflow.

     
    more » « less
  5. In principle, educators can use writing to scaffold students’ understanding of increasingly complex science ideas. In practice, formative assessment of students’ science writing is very labor intensive. We present PyrEval+CR, an automated tool for formative assessment of middle school students’ science essays. It identifies each idea in a student’s science essay, and its importance in the curriculum. 
    more » « less