skip to main content


Title: Identifying NGSS-Aligned Ideas in Student Science Explanations
With the increasing use of online interactive environments for science and engineering education in grades K-12, there is a growing need for detailed automatic analysis of student explanations of ideas and reasoning. With the widespread adoption of the Next Generation Science Standards (NGSS), an important goal is identifying the alignment of student ideas with NGSS-defined dimensions of proficiency. We develop a set of constructed response formative assessment items that call for students to express and integrate ideas across multiple dimensions of the NGSS and explore the effectiveness of state-of-the-art neural sequence-labeling methods for identifying discourse-level expressions of ideas that align with the NGSS. We discuss challenges for idea detection task in the formative science assessment context.  more » « less
Award ID(s):
1812660
NSF-PAR ID:
10184621
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Workshop on Artificial Intelligence for Education (AI4EDU@AAAI)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The Next Generation Science Standards (NGSS) emphasize integrating three dimensions of science learning: disciplinary core ideas, cross-cutting concepts, and science and engineering practices. In this study, we develop formative assessments that measure student understanding of the integration of these three dimensions along with automated scoring methods that distinguish among them. The formative assessments allow students to express their emerging ideas while also capturing progress in integrating core ideas, cross-cutting concepts, and practices. We describe how item and rubric design can work in concert with an automated scoring system to independently score science explanations from multiple perspectives. We describe item design considerations and provide validity evidence for the automated scores. 
    more » « less
  2. Abstract

    The Framework for K‐12 science education (TheFramework) and Next Generation Science Standards (NGSS) emphasize the usefulness of learning progressions in helping align curriculum, instruction, and assessment to organize the learning process. TheFrameworkdefines three dimensions of science as the basis of theoretical learning progressions described in the document and used to develop NGSS. The three dimensions include disciplinary core ideas, scientific and engineering practices, and crosscutting concepts. TheFrameworkdefines three‐dimensional learning (3D learning) as integrating scientific and engineering practices, crosscutting concepts, and disciplinary core ideas to make sense of phenomena. Three‐dimensional learning leads to the development of a deep, useable understanding of big ideas that students can apply to explain phenomena and solve real‐life problems. While theFrameworkdescribes the theoretical basis of 3D learning, and NGSS outlines possible theoretical learning progressions for the three dimensions across grades, we currently have very limited empirical evidence to show that a learning progression for 3D learning can be developed and validated in practice. In this paper, we demonstrate the feasibility of developing a 3D learning progression (3D LP) supported by qualitative and quantitative validity evidence. We first present a hypothetical 3D LP aligned to a previously designed NGSS‐based curriculum. We further present multiple sources of validity evidence for the hypothetical 3D LP, including interview analysis and item response theory (IRT) analysis to show validity evidence for the 3D LP. Finally, we demonstrate the feasibility of using the assessment tool designed to probe levels of the 3D LP for assigning 3D LP levels to individual student answers, which is essential for the practical applicability of any LP. This work demonstrates the usefulness of validated 3D LP for organizing the learning process in the NGSS classroom, which is essential for the successful implementation of NGSS.

     
    more » « less
  3. With the widespread adoption of the Next Generation Science Standards (NGSS), science teachers and online learning environments face the challenge of evaluating students' integration of different dimensions of science learning. Recent advances in representation learning in natural language processing have proven effective across many natural language processing tasks, but a rigorous evaluation of the relative merits of these methods for scoring complex constructed response formative assessments has not previously been carried out. We present a detailed empirical investigation of feature-based, recurrent neural network, and pre-trained transformer models on scoring content in real-world formative assessment data. We demonstrate that recent neural methods can rival or exceed the performance of feature-based methods. We also provide evidence that different classes of neural models take advantage of different learning cues, and pre-trained transformer models may be more robust to spurious, dataset-specific learning cues, better reflecting scoring rubrics. 
    more » « less
  4. null (Ed.)
    With the widespread adoption of the Next Generation Science Standards (NGSS), science teachers and online learning environments face the challenge of evaluating students’ integration of different dimensions of science learning. Recent advances in representation learning in natural language processing have proven effective across many natural language processing tasks, but a rigorous evaluation of the relative merits of these methods for scoring complex constructed response formative assessments has not previously been carried out. We present a detailed empirical investigation of feature-based, recurrent neural network, and pre-trained transformer models on scoring content in real-world formative assessment data. We demonstrate that recent neural methods can rival or exceed the performance of feature-based methods. We also provide evidence that different classes of neural models take advantage of different learning cues, and pre-trained transformer models may be more robust to spurious, dataset-specific learning cues, better reflecting scoring rubrics. 
    more » « less
  5. null (Ed.)
    This research paper describes the development of an assessment instrument for use with middle school students that provides insight into students’ interpretive understanding by looking at early indicators of developing expertise in students’ responses to solution generation, reflection, and concept demonstration tasks. We begin by detailing a synthetic assessment model that served as the theoretical basis for assessing specific thinking skills. We then describe our process of developing test items by working with a Teacher Design Team (TDT) of instructors in our partner school system to set guidelines that would better orient the assessment in that context and working within the framework of standards and disciplinary core ideas enumerated in the Next Generation Science Standards (NGSS). We next specify our process of refining the assessment from 17 items across three separate item pools to a final total of three open-response items. We then provide evidence for the validity and reliability of the assessment instrument from the standards of (1) content, (2) meaningfulness, (3) generalizability, and (4) instructional sensitivity. As part of the discussion from the standards of generalizability and instructional sensitivity, we detail a study carried out in our partner school system in the fall of 2019. The instrument was administered to students in treatment (n= 201) and non-treatment (n = 246) groups, wherein the former participated in a two-to-three-week, NGSS-aligned experimental instructional unit introducing the principles of engineering design that focused on engaging students using the Imaginative Education teaching approach. The latter group were taught using the district’s existing engineering design curriculum. Results from statistical analysis of student responses showed that the interrater reliability of the scoring procedures were good-to-excellent, with intra-class correlation coefficients ranging between .72 and .95. To gauge the instructional sensitivity of the assessment instrument, a series of non-parametric comparative analyses (independent two-group Mann-Whitney tests) were carried out. These found statistically significant differences between treatment and non-treatment student responses related to the outcomes of fluency and elaboration, but not reflection. 
    more » « less