Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Recently, there has been a surge in developing curricula and tools that integrate computing (C) into Science, Technology, Engineering, and Math (STEM) programs. These environments foster authentic problem-solving while facilitating students’ concurrent learning of STEM+C content. In our study, we analyzed students’ behaviors as they worked in pairs to create computational kinematics models of object motion. We derived a domain-specific metric from students’ collaborative dialogue that measured how they integrated science and computing concepts into their problem-solving tasks. Additionally, we computed social metrics such as equity and turn-taking based on the students’ dialogue. We identified and characterized students’ planning, enacting, monitoring, and reflecting behaviors as they worked together on their model construction tasks. This study in-vestigates the impact of students’ collaborative behaviors on their performance in STEM+C computational modeling tasks. By analyzing the relationships between group synergy, turn-taking, and equity measures with task performance, we provide insights into how these collaborative behaviors influence students’ ability to construct accurate models. Our findings underscore the importance of synergistic discourse for overall task success, particularly during the enactment, monitoring, and reflection phases. Conversely, variations in equity and turn-taking have a minimal impact on segment-level task performance.more » « lessFree, publicly-accessible full text available July 1, 2026
-
Abstract Although the “eye-mind link” hypothesis posits that eye movements provide a direct window into cognitive processing, linking eye movements to specific cognitions in real-world settings remains challenging. This challenge may arise because gaze metrics such as fixation duration, pupil size, and saccade amplitude are often aggregated across timelines that include heterogeneous events. To address this, we tested whether aggregating gaze parameters across participant-defined events could support the hypothesis that increased focal processing, indicated by greater gaze duration and pupil diameter, and decreased scene exploration, indicated by smaller saccade amplitude, would predict effective task performance. Using head-mounted eye trackers, nursing students engaged in simulation learning and later segmented their simulation footage into meaningful events, categorizing their behaviors, task outcomes, and cognitive states at the event level. Increased fixation duration and pupil diameter predicted higher student-rated teamwork quality, while increased pupil diameter predicted judgments of effective communication. Additionally, increased saccade amplitude positively predicted students’ perceived self-efficacy. These relationships did not vary across event types, and gaze parameters did not differ significantly between the beginning, middle, and end of events. However, there was a significant increase in fixation duration during the first five seconds of an event compared to the last five seconds of the previous event, suggesting an initial encoding phase at an event boundary. In conclusion, event-level gaze parameters serve as valid indicators of focal processing and scene exploration in natural learning environments, generalizing across event types.more » « less
-
This paper explores the design of two types of pedagogical agents—teaching and peer—in a collaborative STEM+C learning environment, C2STEM, where high school students learn physics (kinematics) and computing by building computational models that simulate the motion of objects. Through in-depth case study interviews with teachers and students, we identify role-based features for these agents to support collaborative learning in open-ended STEM+C learning environments. We propose twelve design principles—four for teaching agents, four for peer agents, and four shared by both—contributing to foundational guidelines for developing agents that enhance collaborative learning through computational modeling.more » « lessFree, publicly-accessible full text available June 10, 2026
-
This paper explores the use of large language models (LLMs) to score and explain short-answer assessments in K-12 science. While existing methods can score more structured math and computer science assessments, they often do not provide explanations for the scores. Our study focuses on employing GPT-4 for automated assessment in middle school Earth Science, combining few-shot and active learning with chain-of-thought reasoning. Using a human-in-the-loop approach, we successfully score and provide meaningful explanations for formative assessment responses. A systematic analysis of our method's pros and cons sheds light on the potential for human-in-the-loop techniques to enhance automated grading for open-ended science assessments.more » « less
-
LLMs have demonstrated proficiency in contextualizing their outputs using human input, often matching or beating human-level performance on a variety of tasks. However, LLMs have not yet been used to characterize synergistic learning in students’ collaborative discourse. In this exploratory work, we take a first step towards adopting a human-in-the-loop prompt engineering approach with GPT-4-Turbo to summarize and categorize students’ synergistic learning during collaborative discourse. Our preliminary findings suggest GPT-4-Turbo may be able to characterize students’ synergistic learning in a manner comparable to humans and that our approach warrants further investigation.more » « less
-
This research explores a novel human-in-the-loop approach that goes beyond traditional prompt engineering approaches to harness Large Language Models (LLMs) with chain-of-thought prompting for grading middle school students’ short answer formative assessments in science and generating useful feedback. While recent efforts have successfully applied LLMs and generative AI to automatically grade assignments in secondary classrooms, the focus has primarily been on providing scores for mathematical and programming problems with little work targeting the generation of actionable insight from the student responses. This paper addresses these limitations by exploring a human-in-the-loop approach to make the process more intuitive and more effective. By incorporating the expertise of educators, this approach seeks to bridge the gap between automated assessment and meaningful educational support in the context of science education for middle school students. We have conducted a preliminary user study, which suggests that (1) co-created models improve the performance of formative feedback generation, and (2) educator insight can be integrated at multiple steps in the process to inform what goes into the model and what comes out. Our findings suggest that in-context learning and human-in-the-loop approaches may provide a scalable approach to automated grading, where the performance of the automated LLM-based grader continually improves over time, while also providing actionable feedback that can support students’ open-ended science learning.more » « less
-
Martin Fred; Norouzi, Narges; Rosenthal, Stephanie (Ed.)This paper examines the use of LLMs to support the grading and explanation of short-answer formative assessments in K12 science topics. While significant work has been done on programmatically scoring well-structured student assessments in math and computer science, many of these approaches produce a numerical score and stop short of providing teachers and students with explanations for the assigned scores. In this paper, we investigate few-shot, in-context learning with chain-of-thought reasoning and active learning using GPT-4 for automated assessment of students’ answers in a middle school Earth Science curriculum. Our findings from this human-in-the-loop approach demonstrate success in scoring formative assessment responses and in providing meaningful explanations for the assigned score. We then perform a systematic analysis of the advantages and limitations of our approach. This research provides insight into how we can use human-in-the-loop methods for the continual improvement of automated grading for open-ended science assessments.more » « less
-
Grieff, S. (Ed.)Recently there has been increased development of curriculum and tools that integrate computing (C) into Science, Technology, Engineering, and Math (STEM) learning environments. These environments serve as a catalyst for authentic collaborative problem-solving (CPS) and help students synergistically learn STEM+C content. In this work, we analyzed students’ collaborative problem-solving behaviors as they worked in pairs to construct computational models in kinematics. We leveraged social measures, such as equity and turn-taking, along with a domain-specific measure that quantifies the synergistic interleaving of science and computing concepts in the students’ dialogue to gain a deeper understanding of the relationship between students’ collaborative behaviors and their ability to complete a STEM+C computational modeling task. Our results extend past findings identifying the importance of synergistic dialogue and suggest that while equitable discourse is important for overall task success, fluctuations in equity and turn-taking at the segment level may not have an impact on segment-level task performance. To better understand students’ segment-level behaviors, we identified and characterized groups’ planning, enacting, and reflection behaviors along with monitoring processes they employed to check their progress as they constructed their models. Leveraging Markov Chain (MC) analysis, we identified differences in high- and low-performing groups’ transitions between these phases of students’ activities. We then compared the synergistic, turn-taking, and equity measures for these groups for each one of the MC model states to gain a deeper understanding of how these collaboration behaviors relate to their computational modeling performance. We believe that characterizing differences in collaborative problem-solving behaviors allows us to gain a better understanding of the difficulties students face as they work on their computational modeling tasks.more » « less
-
Wang, N. (Ed.)In education, intelligent learning environments allow students to choose how to tackle open-ended tasks while monitoring performance and behavior, allowing for the creation of adaptive support to help students overcome challenges. Timely feedback is critical to aid students’ progression toward learning and improved problem-solving. Feedback on text-based student responses can be delayed when teachers are overloaded with work. Automated evaluation can provide quick student feedback while easing the manual evaluation burden for teachers in areas with a high teacher-to-student ratio. Current methods of evaluating student essay responses to questions have included transformer-based natural language processing models with varying degrees of success. One main challenge in training these models is the scarcity of data for student-generated data. Larger volumes of training data are needed to create models that perform at a sufficient level of accuracy. Some studies have vast data, but large quantities are difficult to obtain when educational studies involve student-generated text. To overcome this data scarcity issue, text augmentation techniques have been employed to balance and expand the data set so that models can be trained with higher accuracy, leading to more reliable evaluation and categorization of student answers to aid teachers in the student’s learning progression. This paper examines the text-generating AI model, GPT-3.5, to determine if prompt-based text-generation methods are viable for generating additional text to supplement small sets of student responses for machine learning model training. We augmented student responses across two domains using GPT-3.5 completions and used that data to train a multilingual BERT model. Our results show that text generation can improve model performance on small data sets over simple self-augmentation.more » « less
An official website of the United States government

Full Text Available