Argumentation, a key scientific practice presented in the
This study develops a framework to conceptualize the use and evolution of machine learning (ML) in science assessment. We systematically reviewed 47 studies that applied ML in science assessment and classified them into five categories: (a) constructed response, (b) essay, (c) simulation, (d) educational game, and (e) inter‐discipline. We compared the ML‐based and conventional science assessments and extracted 12 critical characteristics to map three variables in a three‐dimensional framework:
- PAR ID:
- 10455645
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Journal of Research in Science Teaching
- Volume:
- 57
- Issue:
- 9
- ISSN:
- 0022-4308
- Page Range / eLocation ID:
- p. 1430-1459
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract Framework for K-12 Science Education , requires students to construct and critique arguments, but timely evaluation of arguments in large-scale classrooms is challenging. Recent work has shown the potential of automated scoring systems for open response assessments, leveraging machine learning (ML) and artificial intelligence (AI) to aid the scoring of written arguments in complex assessments. Moreover, research has amplified that the features (i.e., complexity, diversity, and structure) of assessment construct are critical to ML scoring accuracy, yet how the assessment construct may be associated with machine scoring accuracy remains unknown. This study investigated how the features associated with the assessment construct of a scientific argumentation assessment item affected machine scoring performance. Specifically, we conceptualized the construct in three dimensions: complexity, diversity, and structure. We employed human experts to code characteristics of the assessment tasks and score middle school student responses to 17 argumentation tasks aligned to three levels of a validated learning progression of scientific argumentation. We randomly selected 361 responses to use as training sets to build machine-learning scoring models for each item. The scoring models yielded a range of agreements with human consensus scores, measured by Cohen’s kappa (mean = 0.60; range 0.38 − 0.89), indicating good to almost perfect performance. We found that higher levels ofComplexity andDiversity of the assessment task were associated with decreased model performance, similarly the relationship between levels ofStructure and model performance showed a somewhat negative linear trend. These findings highlight the importance of considering these construct characteristics when developing ML models for scoring assessments, particularly for higher complexity items and multidimensional assessments. -
Abstract The
Framework for K–12 Science Education (NRC; 2012) placed renewed emphasis on creating equitable science learning opportunities for all learners by engaging in three‐dimensional learning experiences: disciplinary core ideas, crosscutting concepts, and science and engineering practices. Additionally, theFramework calls for a more inclusive approach to science learning that builds upon learners' linguistic practices and funds of knowledge and integrates open‐ended, multimodal approaches to documenting learning throughout the assessment process. To support assessment developers in designing expansiveFramework ‐aligned classroom‐assessment approaches for emergent bilingual learners—learners developing two or more languages—tools are needed to guide design of assessments from their inception. This paper presents a literature‐based framework for science assessment design for emergent bilingual learners that includes components critical to support these learners. We then operationalize the framework into five categories and analyze nine publicly available Next Generation Science Standards sample classroom assessments. The sample tasks allow us to illustrate how those engaged in classroom assessment design might create more expansiveFramework ‐based science classroom assessments appropriate for emergent bilingual learners. -
Abstract The National Research Council’s Framework for K-12 Science Education and the subsequent Next Generation Science Standards have provided a widespread common language for science education reform over the last decade. These efforts have naturally been targeted at the K-12 levels, but we have argued that the three dimensions outlined in these documents—scientific practices, disciplinary core ideas, and crosscutting concepts (together termed three-dimensional learning)—are also a productive route for reform in college-level science courses. However, how and why college-level faculty might be motivated to incorporate three-dimensional learning into their courses is not well understood. Here, we report a mixed-methods study of participants in an interdisciplinary professional development program designed to support faculty in developing assessments and instruction aligned with three-dimensional learning. One cohort of faculty (
N = 8) was interviewed, and four cohorts of faculty (N = 33) were surveyed. Using expectancy-value theory as an organizational framework, we identified themes of perceived values and costs that participants discussed in implementing three-dimensional learning. Based on a cluster analysis of all survey participants’ motivational profiles, we propose that these themes apply to the broader population of participants in this program. We recommend specific interventions to improve faculty motivation for implementing three-dimensional learning: emphasizing the utility value of three-dimensional learning in effecting positive learning gains for students; drawing connections between the dimensions of three-dimensional learning and faculty’s disciplinary identities; highlighting scientific practices as a key leverage point for faculty ability beliefs; minimizing cognitive dissonance for faculty in understanding the similarities and differences between the three dimensions; focusing on assessment writing as a keystone professional development activity; and aligning local evaluation practices and promotion policies with the 3DL framework. -
Abstract This study explores the role of unconventional forms of classroom assessments in expanding minoritized students' opportunities to learn (OTL) in high school physics classrooms. In this research + practice partnership project, high school physics teachers and researchers co‐designed a unit about momentum to expand minoritized students' meaningful OTL. Specifically, the unit was designed to (a) expand what it means to learn and be good at science using unconventional forms of assessment, (b) facilitate students to leverage everyday experiences, concerns, and home languages to do science, and (c) support teachers to facilitate meaningful dialogical interactions. The analysis focused on examining minoritized students' OTLs mediated by intentionally designed, curriculum‐embedded, unconventional forms of assessments. The participants were a total of 76 students in 11th or 12th grade. Data were gathered in the form of student assessment tasks, a science identity survey, and interviews. Data analysis entailed: (a) statistical analysis of student performance measured by conventional and unconventional assessments and (b) qualitative analysis of two Latinx students' experiences with the co‐designed curriculum and assessments. The findings suggest that the use of unconventional forms of curriculum‐embedded assessment can increase minoritized students' OTL
if the assessment facilitates minoritized students to personally and deeply relate themselves to academic tasks. -
The Framework for K-12 Science Education recognizes modeling as an essential practice for building deep understanding of science. Modeling assessments should measure the ability to integrate Disciplinary Core Ideas and Crosscutting Concepts. Machine learning (ML) has been utilized to score and provide feedback on open-ended Learning Progression (LP)-aligned assessments. Analytic rubrics have been shown to be easier to evaluate the validity of ML-based scores. A possible drawback of using analytic rubrics is the potential for oversimplification of integrated ideas. We demonstrate the deconstruction of a 3D holistic rubric for modeling assessments aligned LP for Physical Science. We describe deconstructing this rubric into analytic categories for ML training and to preserve its 3D nature.more » « less