skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Towards a formative feedback generation agent: Leveraging a human-in-the-loop, chain-of-thought prompting approach with LLMs to evaluate formative assessment responses in K-12 science.
This research explores a novel human-in-the-loop approach that goes beyond traditional prompt engineering approaches to harness Large Language Models (LLMs) with chain-of-thought prompting for grading middle school students’ short answer formative assessments in science and generating useful feedback. While recent efforts have successfully applied LLMs and generative AI to automatically grade assignments in secondary classrooms, the focus has primarily been on providing scores for mathematical and programming problems with little work targeting the generation of actionable insight from the student responses. This paper addresses these limitations by exploring a human-in-the-loop approach to make the process more intuitive and more effective. By incorporating the expertise of educators, this approach seeks to bridge the gap between automated assessment and meaningful educational support in the context of science education for middle school students. We have conducted a preliminary user study, which suggests that (1) co-created models improve the performance of formative feedback generation, and (2) educator insight can be integrated at multiple steps in the process to inform what goes into the model and what comes out. Our findings suggest that in-context learning and human-in-the-loop approaches may provide a scalable approach to automated grading, where the performance of the automated LLM-based grader continually improves over time, while also providing actionable feedback that can support students’ open-ended science learning.  more » « less
Award ID(s):
2017000
PAR ID:
10468997
Author(s) / Creator(s):
; ;
Publisher / Repository:
dl.acm.org
Date Published:
Subject(s) / Keyword(s):
Education LLM formative feedback STEM learning
Format(s):
Medium: X
Location:
iui.acm.org
Sponsoring Org:
National Science Foundation
More Like this
  1. Martin Fred; Norouzi, Narges; Rosenthal, Stephanie (Ed.)
    This paper examines the use of LLMs to support the grading and explanation of short-answer formative assessments in K12 science topics. While significant work has been done on programmatically scoring well-structured student assessments in math and computer science, many of these approaches produce a numerical score and stop short of providing teachers and students with explanations for the assigned scores. In this paper, we investigate few-shot, in-context learning with chain-of-thought reasoning and active learning using GPT-4 for automated assessment of students’ answers in a middle school Earth Science curriculum. Our findings from this human-in-the-loop approach demonstrate success in scoring formative assessment responses and in providing meaningful explanations for the assigned score. We then perform a systematic analysis of the advantages and limitations of our approach. This research provides insight into how we can use human-in-the-loop methods for the continual improvement of automated grading for open-ended science assessments. 
    more » « less
  2. This paper explores the use of large language models (LLMs) to score and explain short-answer assessments in K-12 science. While existing methods can score more structured math and computer science assessments, they often do not provide explanations for the scores. Our study focuses on employing GPT-4 for automated assessment in middle school Earth Science, combining few-shot and active learning with chain-of-thought reasoning. Using a human-in-the-loop approach, we successfully score and provide meaningful explanations for formative assessment responses. A systematic analysis of our method's pros and cons sheds light on the potential for human-in-the-loop techniques to enhance automated grading for open-ended science assessments. 
    more » « less
  3. Mills, Caitlin; Alexandron, Giora; Taibi, Davide; Lo_Bosco, Giosuè; Paquette, Luc (Ed.)
    Short answer assessment is a vital component of science education, allowing evaluation of students' complex three-dimensional understanding. Large language models (LLMs) that possess human-like ability in linguistic tasks are increasingly popular in assisting human graders to reduce their workload. However, LLMs' limitations in domain knowledge restrict their understanding in task-specific requirements and hinder their ability to achieve satisfactory performance. Retrieval-augmented generation (RAG) emerges as a promising solution by enabling LLMs to access relevant domain-specific knowledge during assessment. In this work, we propose an adaptive RAG framework for automated grading that dynamically retrieves and incorporates domain-specific knowledge based on the question and student answer context. Our approach combines semantic search and curated educational sources to retrieve valuable reference materials. Experimental results in a science education dataset demonstrate that our system achieves an improvement in grading accuracy compared to baseline LLM approaches. The findings suggest that RAG-enhanced grading systems can serve as reliable support with efficient performance gains. 
    more » « less
  4. The Next Generation Science Standards and the National Research Council recognize systems thinking as an essential skill to address the global challenges of the 21st century. But the habits of mind needed to understand complex systems are not readily learned through traditional approaches. Recently large-scale interactive multi-user immersive simulations are being used to expose the learners to diverse topics that emulate real-world complex systems phenomena. These modern-day mixed reality simulations are unique in that the learners are an integral part of the evolving dynamics. The decisions they make and the actions that follow, collectively impact the simulated complex system, much like any real-world complex system. But the learners have difficulty understanding these coupled complex systems processes, and often get “lost” or “stuck,” and need help navigating the problem space. Formative feedback is the traditional way educators support learners during problem solving. Traditional goal-based and learner-centered approaches don’t scale well to environments that allow learners to explore multiple goals or solutions, and multiple solution paths (Mallavarapu & Lyons, 2020). In this work, we reconceptualize formative feedback for complex systems-based learning environments, formative fugues, (a term derived from music by Reitman, 1964) to allow learners to make informed decisions about their own exploration paths. We discuss a novel computational approach that employs causal inference and pattern matching to characterize the exploration paths of prior learners and generate situationally relevant formative feedback. We extract formative fugues from the data collected from an ecological complex systems simulation installed at a museum. The extracted feedback does not presume the goals of the learners, but helps the learners understand what choices and events led to the current state of the problem space, and what paths forward are possible. We conclude with a discussion of implications of using formative fugues for complex systems education. 
    more » « less
  5. Abstract This paper provides an experience report on a co‐design approach with teachers to co‐create learning analytics‐based technology to support problem‐based learning in middle school science classrooms. We have mapped out a workflow for such applications and developed design narratives to investigate the implementation, modifications and temporal roles of the participants in the design process. Our results provide precedent knowledge on co‐designing with experienced and novice teachers and co‐constructing actionable insight that can help teachers engage more effectively with their students' learning and problem‐solving processes during classroom PBL implementations. Practitioner notesWhat is already known about this topicSuccess of educational technology depends in large part on the technology's alignment with teachers' goals for their students, teaching strategies and classroom context.Teacher and researcher co‐design of educational technology and supporting curricula has proven to be an effective way for integrating teacher insight and supporting their implementation needs.Co‐designing learning analytics and support technologies with teachers is difficult due to differences in design and development goals, workplace norms, and AI‐literacy and learning analytics background of teachers.What this paper addsWe provide a co‐design workflow for middle school teachers that centres on co‐designing and developing actionable insights to support problem‐based learning (PBL) by systematic development of responsive teaching practices using AI‐generated learning analytics.We adapt established human‐computer interaction (HCI) methods to tackle the complex task of classroom PBL implementation, working with experienced and novice teachers to create a learning analytics dashboard for a PBL curriculum.We demonstrate researcher and teacher roles and needs in ensuring co‐design collaboration and the co‐construction of actionable insight to support middle school PBL.Implications for practice and/or policyLearning analytics researchers will be able to use the workflow as a tool to support their PBL co‐design processes.Learning analytics researchers will be able to apply adapted HCI methods for effective co‐design processes.Co‐design teams will be able to pre‐emptively prepare for the difficulties and needs of teachers when integrating middle school teacher feedback during the co‐design process in support of PBL technologies. 
    more » « less