Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract We discuss transforming STEM education using three aspects: learning progressions (LPs), constructed response performance assessments, and artificial intelligence (AI). Using LPs to inform instruction, curriculum, and assessment design helps foster students’ ability to apply content and practices to explain phenomena, which reflects deeper science understanding. To measure the progress along these LPs, performance assessments combining elements of disciplinary ideas, crosscutting concepts and practices are needed. However, these tasks are time-consuming and expensive to score and provide feedback for. Artificial intelligence (AI) allows to validate the LPs and evaluate performance assessments for many students quickly and efficiently. The evaluation provides a report describing student progress along LP and the supports needed to attain a higher LP level. We suggest using unsupervised, semi-supervised ML and generative AI (GAI) at early LP validation stages to identify relevant proficiency patterns and start building an LP. We further suggest employing supervised ML and GAI for developing targeted LP-aligned performance assessment for more accurate performance diagnosis at advanced LP validation stages. Finally, we discuss employing AI for designing automatic feedback systems for providing personalized feedback to students and helping teachers implement LP-based learning. We discuss the challenges of realizing these tasks and propose future research avenues.more » « less
-
The Framework for K-12 Science Education recognizes modeling as an essential practice for building deep understanding of science. Modeling assessments should measure the ability to integrate Disciplinary Core Ideas and Crosscutting Concepts. Machine learning (ML) has been utilized to score and provide feedback on open-ended Learning Progression (LP)-aligned assessments. Analytic rubrics have been shown to be easier to evaluate the validity of ML-based scores. A possible drawback of using analytic rubrics is the potential for oversimplification of integrated ideas. We demonstrate the deconstruction of a 3D holistic rubric for modeling assessments aligned LP for Physical Science. We describe deconstructing this rubric into analytic categories for ML training and to preserve its 3D nature.more » « lessFree, publicly-accessible full text available April 11, 2025
-
This article builds on the work of Scott et al. (Scott EE, Cerchiara J, McFarland JL, Wenderoth MP, Doherty JH. J Res Sci Teach 1: 37, 2023) and Shiroda et al. (Shiroda M, Fleming MP, Haudek KC. Front Educ 8: 989836, 2023) to quantitatively examine student language in written explanations of mass balance across six contexts using constructed response assessments. These results present an evaluation of student mass balance language and provide researchers and practitioners with tools to assist students in constructing scientific mass balance reasoning explanations.more » « less
-
We novelly applied established ecology methods to quantify and compare language diversity within a corpus of short written student texts. Constructed responses (CRs) are a common form of assessment but are difficult to evaluate using traditional methods of lexical diversity due to text length restrictions. Herein, we examined the utility of ecological diversity measures and ordination techniques to quantify differences in short texts by applying these methods in parallel to traditional text analysis methods to a corpus of previously studied college student CRs. The CRs were collected at two time points (Timing), from three types of higher-ed institutions (Type), and across three levels of student understanding (Thinking). Using previous work, we were able to predict that we would observe the most difference based on Thinking, then Timing and did not expect differences based on Type allowing us to test the utility of these methods for categorical examination of the corpus. We found that the ecological diversity metrics that compare CRs to each other (Whittaker’s beta, species turnover, and Bray–Curtis Dissimilarity) were informative and correlated well with our predicted differences among categories and other text analysis methods. Other ecological measures, including Shannon’s and Simpson’s diversity, measure the diversity of language within a single CR. Additionally, ordination provided meaningful visual representations of the corpus by reducing complex word frequency matrices to two-dimensional graphs. Using the ordination graphs, we were able to observe patterns in the CR corpus that further supported our predictions for the data set. This work establishes novel approaches to measuring language diversity within short texts that can be used to examine differences in student language and possible associations with categorical data.more » « less
-
null (Ed.)Abstract We systematically compared two coding approaches to generate training datasets for machine learning (ML): (i) a holistic approach based on learning progression levels and (ii) a dichotomous, analytic approach of multiple concepts in student reasoning, deconstructed from holistic rubrics. We evaluated four constructed response assessment items for undergraduate physiology, each targeting five levels of a developing flux learning progression in an ion context. Human-coded datasets were used to train two ML models: (i) an 8-classification algorithm ensemble implemented in the Constructed Response Classifier (CRC), and (ii) a single classification algorithm implemented in LightSide Researcher’s Workbench. Human coding agreement on approximately 700 student responses per item was high for both approaches with Cohen’s kappas ranging from 0.75 to 0.87 on holistic scoring and from 0.78 to 0.89 on analytic composite scoring. ML model performance varied across items and rubric type. For two items, training sets from both coding approaches produced similarly accurate ML models, with differences in Cohen’s kappa between machine and human scores of 0.002 and 0.041. For the other items, ML models trained with analytic coded responses and used for a composite score, achieved better performance as compared to using holistic scores for training, with increases in Cohen’s kappa of 0.043 and 0.117. These items used a more complex scenario involving movement of two ions. It may be that analytic coding is beneficial to unpacking this additional complexity.more » « less
-
Constructed responses can be used to assess the complexity of student thinking and can be evaluated using rubrics. The two most typical rubric types used are holistic and analytic. Holistic rubrics may be difficult to use with expert-level reasoning that has additive or overlapping language. In an attempt to unpack complexity in holistic rubrics at a large scale, we have developed a systematic approach called deconstruction. We define deconstruction as the process of converting a holistic rubric into defining individual conceptual components that can be used for analytic rubric development and application. These individual components can then be recombined into the holistic score which keeps true to the holistic rubric purpose, while maximizing the benefits and minimizing the shortcomings of each rubric type. This paper outlines the deconstruction process and presents a case study that shows defined concept definitions for a hierarchical holistic rubric developed for an undergraduate physiology-content reasoning context. These methods can be used as one way for assessment developers to unpack complex student reasoning, which may ultimately improve reliability and validation of assessments that are targeted at uncovering large-scale complex scientific reasoning.more » « less