skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 8:00 PM ET on Friday, March 21 until 8:00 AM ET on Saturday, March 22 due to maintenance. We apologize for the inconvenience.


Title: Automated scoring of science explanations for multiple NGSS dimensions and knowledge integration
The Next Generation Science Standards (NGSS) emphasize integrating three dimensions of science learning: disciplinary core ideas, cross-cutting concepts, and science and engineering practices. In this study, we develop formative assessments that measure student understanding of the integration of these three dimensions along with automated scoring methods that distinguish among them. The formative assessments allow students to express their emerging ideas while also capturing progress in integrating core ideas, cross-cutting concepts, and practices. We describe how item and rubric design can work in concert with an automated scoring system to independently score science explanations from multiple perspectives. We describe item design considerations and provide validity evidence for the automated scores.  more » « less
Award ID(s):
1812660
PAR ID:
10184624
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Annual Meeting of the American Educational Research Association (AERA)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Educational research supports incorporating active engagement into K-12 education using authentic STEM experiences. While there are discipline-specific resources to provide students with such experiences, there are limited transdisciplinary opportunities that integrate engineering education and technological skill-building to contextualize core scientific concepts. Here, we present an adaptable module that integrates hands-on technology education and place-based learning to improve student understanding of key chemistry concepts as they relate to local environmental science. The module also supports disciplinary core ideas, practices, and cross-cutting concepts in accordance with the Next Generation Science Standards. We field-tested our module in three different high school courses: Chemistry, Oceanography and Advanced Placement Environmental Science at schools in Washington, USA. Students built spectrophotometric pH sensors using readily available electronic components and calibrated them with known pH reference standards. Students then used their sensors to measure the pH of local environmental water samples. Assessments showed significant improvement in content knowledge in all three courses relating to environmental relevance of pH, and to the design, use and environmental application of sensors. Students also reported increased self-confidence in the material, even when their content knowledge remained the same. These findings suggest that classroom sensor building and collection of environmental data increases student understanding and self-confidence by connecting chemistry concepts to local environmental settings. 
    more » « less
  2. Abstract

    The Framework for K‐12 science education (TheFramework) and Next Generation Science Standards (NGSS) emphasize the usefulness of learning progressions in helping align curriculum, instruction, and assessment to organize the learning process. TheFrameworkdefines three dimensions of science as the basis of theoretical learning progressions described in the document and used to develop NGSS. The three dimensions include disciplinary core ideas, scientific and engineering practices, and crosscutting concepts. TheFrameworkdefines three‐dimensional learning (3D learning) as integrating scientific and engineering practices, crosscutting concepts, and disciplinary core ideas to make sense of phenomena. Three‐dimensional learning leads to the development of a deep, useable understanding of big ideas that students can apply to explain phenomena and solve real‐life problems. While theFrameworkdescribes the theoretical basis of 3D learning, and NGSS outlines possible theoretical learning progressions for the three dimensions across grades, we currently have very limited empirical evidence to show that a learning progression for 3D learning can be developed and validated in practice. In this paper, we demonstrate the feasibility of developing a 3D learning progression (3D LP) supported by qualitative and quantitative validity evidence. We first present a hypothetical 3D LP aligned to a previously designed NGSS‐based curriculum. We further present multiple sources of validity evidence for the hypothetical 3D LP, including interview analysis and item response theory (IRT) analysis to show validity evidence for the 3D LP. Finally, we demonstrate the feasibility of using the assessment tool designed to probe levels of the 3D LP for assigning 3D LP levels to individual student answers, which is essential for the practical applicability of any LP. This work demonstrates the usefulness of validated 3D LP for organizing the learning process in the NGSS classroom, which is essential for the successful implementation of NGSS.

     
    more » « less
  3. Martin Fred ; Norouzi, Narges ; Rosenthal, Stephanie (Ed.)
    This paper examines the use of LLMs to support the grading and explanation of short-answer formative assessments in K12 science topics. While significant work has been done on programmatically scoring well-structured student assessments in math and computer science, many of these approaches produce a numerical score and stop short of providing teachers and students with explanations for the assigned scores. In this paper, we investigate few-shot, in-context learning with chain-of-thought reasoning and active learning using GPT-4 for automated assessment of students’ answers in a middle school Earth Science curriculum. Our findings from this human-in-the-loop approach demonstrate success in scoring formative assessment responses and in providing meaningful explanations for the assigned score. We then perform a systematic analysis of the advantages and limitations of our approach. This research provides insight into how we can use human-in-the-loop methods for the continual improvement of automated grading for open-ended science assessments. 
    more » « less
  4. Abstract

    TheFramework for K–12 Science Education(NRC; 2012) placed renewed emphasis on creating equitable science learning opportunities for all learners by engaging in three‐dimensional learning experiences: disciplinary core ideas, crosscutting concepts, and science and engineering practices. Additionally, theFrameworkcalls for a more inclusive approach to science learning that builds upon learners' linguistic practices and funds of knowledge and integrates open‐ended, multimodal approaches to documenting learning throughout the assessment process. To support assessment developers in designing expansiveFramework‐aligned classroom‐assessment approaches for emergent bilingual learners—learners developing two or more languages—tools are needed to guide design of assessments from their inception. This paper presents a literature‐based framework for science assessment design for emergent bilingual learners that includes components critical to support these learners. We then operationalize the framework into five categories and analyze nine publicly available Next Generation Science Standards sample classroom assessments. The sample tasks allow us to illustrate how those engaged in classroom assessment design might create more expansiveFramework‐based science classroom assessments appropriate for emergent bilingual learners.

     
    more » « less
  5. Abstract

    Argumentation, a key scientific practice presented in theFramework for K-12 Science Education, requires students to construct and critique arguments, but timely evaluation of arguments in large-scale classrooms is challenging. Recent work has shown the potential of automated scoring systems for open response assessments, leveraging machine learning (ML) and artificial intelligence (AI) to aid the scoring of written arguments in complex assessments. Moreover, research has amplified that the features (i.e., complexity, diversity, and structure) of assessment construct are critical to ML scoring accuracy, yet how the assessment construct may be associated with machine scoring accuracy remains unknown. This study investigated how the features associated with the assessment construct of a scientific argumentation assessment item affected machine scoring performance. Specifically, we conceptualized the construct in three dimensions: complexity, diversity, and structure. We employed human experts to code characteristics of the assessment tasks and score middle school student responses to 17 argumentation tasks aligned to three levels of a validated learning progression of scientific argumentation. We randomly selected 361 responses to use as training sets to build machine-learning scoring models for each item. The scoring models yielded a range of agreements with human consensus scores, measured by Cohen’s kappa (mean = 0.60; range 0.38 − 0.89), indicating good to almost perfect performance. We found that higher levels ofComplexityandDiversity of the assessment task were associated with decreased model performance, similarly the relationship between levels ofStructureand model performance showed a somewhat negative linear trend. These findings highlight the importance of considering these construct characteristics when developing ML models for scoring assessments, particularly for higher complexity items and multidimensional assessments.

     
    more » « less