NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Extending a pretrained language model (BERT) using an ontological perspective to classify students’ scientific expertise level from written responses

https://doi.org/10.1186/s43031-025-00149-5

Wang, Heqiao; Haudek, Kevin C; Manzanares, Amanda D; Romulo, Chelsie L; Royse, Emily A; Azzarello, Caterina B (December 2025, Disciplinary and Interdisciplinary Science Education Research)

The complex and interdisciplinary nature of scientific concepts presents formidable challenges for students in developing their knowledge-in-use skills. The utilization of computerized analysis for evaluating students’ contextualized constructed responses offers a potential avenue for educators to develop personalized and scalable interventions, thus supporting the current teaching and learning of science. While prior research in artificial intelligence has demonstrated the effectiveness of algorithms, including Bidirectional Encoder Representations from Transformers (BERT), in tasks like automated classifications of constructed responses, these efforts have predominantly leaned towards text-level features, often overlooking the exploration of conceptual ideas embedded in students’ responses from a cognitive perspective. Despite BERT’s performance in downstream tasks, challenges may arise in domain-specific tasks, particularly in establishing knowledge connections between specialized and open domains. These challenges become pronounced in small-scale and imbalanced educational datasets, where the available information for fine-tuning is frequently inadequate to capture task-specific nuances and contextual details. The primary objective of the present study is to investigate the effectiveness of a pretrained language model, when integrated with an ontological framework aligned with a contextualized science assessment, in classifying students’ expertise levels in scientific explanation. Our findings indicate that while pretrained language models, such as BERT, contribute to enhanced performance in language-related tasks within educational contexts, the incorporation of identifying domain-specific terms and extracting and substituting with their associated sibling terms in sentences through ontology-based systems can significantly improve classification model performance. Further, we qualitatively examined student responses and found that, as expected, the ontology framework identified and substituted key domain-specific terms in student responses that led to more accurate predictive scores. The study explores the practical implementation of ontology in assessment evaluation to facilitate formative assessment and formulate instructional strategies.
more » « less
Free, publicly-accessible full text available December 1, 2026
Learning Progression-Guided AI Evaluation of Scientific Models Integrating Writing and Drawing to Support Multi-Modal Knowledge-in-Use

Kaldaras, Leonora; Li, Tingting; Djagba, Prudence; Haudek, Kevin; Krajcik, Joseph (April 2025, AERA Annual Meeting)

Free, publicly-accessible full text available April 23, 2026
A LLM-Powered Automatic Grading Framework with Human-Level Guidelines Optimization

https://doi.org/10.5281/zenodo.15870201

Chu, Yucheng; Li, Hang; Yang, Kaiqi; Shomer, Harry; Copur-Gencturk, Yasemin; Kaldaras, Leonora; Haudek, Kevin; Krajcik, Joseph; Shin, Namsoo; Liu, Hui; et al (July 2025, International Educational Data Mining Society)
Mills, Caitlin; Alexandron, Giora; Taibi, Davide; Lo_Bosco, Giosuè; Paquette, Luc (Ed.)
Open-text responses provide researchers and educators with rich, nuanced insights that multiple-choice questions cannot capture. When reliably assessed, such responses have the potential to enhance teaching and learning. However, scaling and consistently capturing these nuances remain significant challenges, limiting the widespread use of open-text questions in educational research and assessments. In this paper, we introduce and evaluate GradeOpt, a unified multiagent automatic short-answer grading (ASAG) framework that leverages large language models (LLMs) as graders for short-answer responses. More importantly, GradeOpt incorporates two additional LLM-based agents—the reflector and the refiner—into the multi-agent system. This enables GradeOpt to automatically optimize the original grading guidelines by performing self-reflection on its errors. To assess GradeOpt's effectiveness, we conducted experiments on two representative ASAG datasets, which include items designed to capture key aspects of teachers' pedagogical knowledge and students' learning progress. Our results demonstrate that GradeOpt consistently outperforms representative baselines in both grading accuracy and alignment with human evaluators across different knowledge domains. Finally, comprehensive ablation studies validate the contributions of GradeOpt's individual components, confirming their impact on overall performance.
more » « less
Free, publicly-accessible full text available July 12, 2026
Utilizing Deep Learning AI to Analyze Scientific Models: Overcoming Challenges

https://doi.org/10.1007/s10956-025-10217-0

Li, Tingting; Haudek, Kevin; Krajcik, Joseph (April 2025, Journal of Science Education and Technology)
Employing automatic analysis tools aligned to learning progressions to assess knowledge application and support learning in STEM

https://doi.org/10.1186/s40594-024-00516-0

Kaldaras, Leonora; Haudek, Kevin; Krajcik, Joseph (November 2024, International Journal of STEM Education)

Abstract We discuss transforming STEM education using three aspects: learning progressions (LPs), constructed response performance assessments, and artificial intelligence (AI). Using LPs to inform instruction, curriculum, and assessment design helps foster students’ ability to apply content and practices to explain phenomena, which reflects deeper science understanding. To measure the progress along these LPs, performance assessments combining elements of disciplinary ideas, crosscutting concepts and practices are needed. However, these tasks are time-consuming and expensive to score and provide feedback for. Artificial intelligence (AI) allows to validate the LPs and evaluate performance assessments for many students quickly and efficiently. The evaluation provides a report describing student progress along LP and the supports needed to attain a higher LP level. We suggest using unsupervised, semi-supervised ML and generative AI (GAI) at early LP validation stages to identify relevant proficiency patterns and start building an LP. We further suggest employing supervised ML and GAI for developing targeted LP-aligned performance assessment for more accurate performance diagnosis at advanced LP validation stages. Finally, we discuss employing AI for designing automatic feedback systems for providing personalized feedback to students and helping teachers implement LP-based learning. We discuss the challenges of realizing these tasks and propose future research avenues.
more » « less
Developing Rubrics for AI Scoring of NGSS Learning Progression-based Scientific Models

Kaldaras, Leonora; Li, Tingting; Haudek, Kevin; Krajcik, Joseph (April 2024, American Educational Research Association)

The Framework for K-12 Science Education recognizes modeling as an essential practice for building deep understanding of science. Modeling assessments should measure the ability to integrate Disciplinary Core Ideas and Crosscutting Concepts. Machine learning (ML) has been utilized to score and provide feedback on open-ended Learning Progression (LP)-aligned assessments. Analytic rubrics have been shown to be easier to evaluate the validity of ML-based scores. A possible drawback of using analytic rubrics is the potential for oversimplification of integrated ideas. We demonstrate the deconstruction of a 3D holistic rubric for modeling assessments aligned LP for Physical Science. We describe deconstructing this rubric into analytic categories for ML training and to preserve its 3D nature.
more » « less
Full Text Available
FEW questions, many answers: using machine learning to assess how students connect food–energy–water (FEW) concepts

https://doi.org/10.1057/s41599-024-03499-z

Royse, Emily A; Manzanares, Amanda D; Wang, Heqiao; Haudek, Kevin C; Azzarello, Caterina Belle; Horne, Lydia R; Druckenbrod, Daniel L; Shiroda, Megan; Adams, Sol R; Fairchild, Ennea; et al (December 2024, Humanities and Social Sciences Communications)

Full Text Available
Covariational reasoning and item context affect language in undergraduate mass balance written explanations

https://doi.org/10.1152/advan.00156.2022

Shiroda, Megan; Doherty, Jennifer H.; Scott, Emily E.; Haudek, Kevin C. (December 2023, Advances in Physiology Education)

This article builds on the work of Scott et al. (Scott EE, Cerchiara J, McFarland JL, Wenderoth MP, Doherty JH. J Res Sci Teach 1: 37, 2023) and Shiroda et al. (Shiroda M, Fleming MP, Haudek KC. Front Educ 8: 989836, 2023) to quantitatively examine student language in written explanations of mass balance across six contexts using constructed response assessments. These results present an evaluation of student mass balance language and provide researchers and practitioners with tools to assist students in constructing scientific mass balance reasoning explanations.
more » « less
Full Text Available
Ecological diversity methods improve quantitative examination of student language in short constructed responses in STEM

https://doi.org/10.3389/feduc.2023.989836

Shiroda, Megan; Fleming, Michael P.; Haudek, Kevin C. (February 2023, Frontiers in Education)

We novelly applied established ecology methods to quantify and compare language diversity within a corpus of short written student texts. Constructed responses (CRs) are a common form of assessment but are difficult to evaluate using traditional methods of lexical diversity due to text length restrictions. Herein, we examined the utility of ecological diversity measures and ordination techniques to quantify differences in short texts by applying these methods in parallel to traditional text analysis methods to a corpus of previously studied college student CRs. The CRs were collected at two time points (Timing), from three types of higher-ed institutions (Type), and across three levels of student understanding (Thinking). Using previous work, we were able to predict that we would observe the most difference based on Thinking, then Timing and did not expect differences based on Type allowing us to test the utility of these methods for categorical examination of the corpus. We found that the ecological diversity metrics that compare CRs to each other (Whittaker’s beta, species turnover, and Bray–Curtis Dissimilarity) were informative and correlated well with our predicted differences among categories and other text analysis methods. Other ecological measures, including Shannon’s and Simpson’s diversity, measure the diversity of language within a single CR. Additionally, ordination provided meaningful visual representations of the corpus by reducing complex word frequency matrices to two-dimensional graphs. Using the ordination graphs, we were able to observe patterns in the CR corpus that further supported our predictions for the data set. This work establishes novel approaches to measuring language diversity within short texts that can be used to examine differences in student language and possible associations with categorical data.
more » « less
Full Text Available
What a Difference in Pressure Makes! A Framework Describing Undergraduate Students’ Reasoning about Bulk Flow Down Pressure Gradients

https://doi.org/10.1187/cbe.20-01-0003

Doherty, Jennifer H.; Scott, Emily E.; Cerchiara, Jack A.; Jescovitch, Lauren N.; McFarland, Jenny L.; Haudek, Kevin C.; Wenderoth, Mary Pat (June 2023, CBE—Life Sciences Education)
Gardner, Stephanie (Ed.)
This paper details the development of the first reasoning framework to describe how students’ reasoning about biological bulk flow pressure gradients develop toward scientific, mechanistic reasoning.
more » « less
Full Text Available

« Prev Next »

Search for: All records