skip to main content


Title: Automated text scoring and real‐time adjustable feedback: Supporting revision of scientific arguments involving uncertainty
Abstract

This paper describes HASbot, an automated text scoring and real‐time feedback system designed to support student revision of scientific arguments. Students submit open‐ended text responses to explain how their data support claims and how the limitations of their data affect the uncertainty of their explanations. HASbot automatically scores these text responses and returns the scores with feedback to students. Data were collected from 343 middle‐ and high‐school students taught by nine teachers across seven states in the United States. A mixed methods design was applied to investigate (a) how students’ utilization of HASbot impacted their development of uncertainty‐infused scientific arguments; (b) how students used feedback to revise their arguments, and (c) how the current design of HASbot supported or hindered students’ revisions. Paired samplettests indicate that students made significant gains from pretest to posttest in uncertainty‐infused scientific argumentation, ES = 1.52 SD,p < 0.001. Linear regression analysis results indicate that students' HASbot use significantly contributed to their posttest performance on uncertainty‐infused scientific argumentation while gender, English language learner status, and prior computer experience did not. From the analysis of videos, we identified several affordances and limitations of HASbot.

 
more » « less
NSF-PAR ID:
10088378
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Science Education
Volume:
103
Issue:
3
ISSN:
0036-8326
Page Range / eLocation ID:
p. 590-622
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Argumentation is fundamental to science education, both as a prominent feature of scientific reasoning and as an effective mode of learning—a perspective reflected in contemporary frameworks and standards. The successful implementation of argumentation in school science, however, requires a paradigm shift in science assessment from the measurement of knowledge and understanding to the measurement of performance and knowledge in use. Performance tasks requiring argumentation must capture the many ways students can construct and evaluate arguments in science, yet such tasks are both expensive and resource‐intensive to score. In this study we explore how machine learning text classification techniques can be applied to develop efficient, valid, and accurate constructed‐response measures of students' competency with written scientific argumentation that are aligned with a validated argumentation learning progression. Data come from 933 middle school students in the San Francisco Bay Area and are based on three sets of argumentation items in three different science contexts. The findings demonstrate that we have been able to develop computer scoring models that can achieve substantial to almost perfect agreement between human‐assigned and computer‐predicted scores. Model performance was slightly weaker for harder items targeting higher levels of the learning progression, largely due to the linguistic complexity of these responses and the sparsity of higher‐level responses in the training data set. Comparing the efficacy of different scoring approaches revealed that breaking down students' arguments into multiple components (e.g., the presence of an accurate claim or providing sufficient evidence), developing computer models for each component, and combining scores from these analytic components into a holistic score produced better results than holistic scoring approaches. However, this analytical approach was found to be differentially biased when scoring responses from English learners (EL) students as compared to responses from non‐EL students on some items. Differences in the severity between human and computer scores for EL between these approaches are explored, and potential sources of bias in automated scoring are discussed.

     
    more » « less
  2. Abstract

    Flourishing in today's global society requires citizens that are both intelligent consumers and producers of scientific understanding. Indeed, the modern world is facing ever‐more complex problems that require innovative ways of thinking about, around, and with science. As numerous educational stakeholders have suggested, such skills and abilities are not innate and must, therefore, be taught (e.g., McNeill & Krajcik,Journal of Research in Science Teaching,45(1), 53–78. 2008). However, such instruction requires a fundamental shift in science pedagogy so as to foster knowledge and practices like deep, conceptual understanding, model‐based reasoning, and oral and written argumentation where scientific evidence is evaluated (National Research Council,Next Generation Science Standards: For States, by States, Washington, DC: The National Academies Press, 2013). The purpose of our quasi‐experimental study was to examine the effectiveness of Quality Talk Science, a professional development model and intervention, in fostering changes in teachers’ and students’ discourse practices as well as their conceptual understanding and scientific argumentation. Findings revealed treatment teachers’ and students’ discourse practices better reflected critical‐analytic thinking and argumentation at posttest relative to comparison classrooms. Similarly, at posttest treatment students produced stronger written scientific arguments than comparison students. Students’ growth in conceptual understanding was nonsignificant. These findings suggest discourse interventions such as Quality Talk Science can improve high‐school students’ ability to engage in scientific argumentation.

     
    more » « less
  3. Abstract

    Argumentation, a key scientific practice presented in theFramework for K-12 Science Education, requires students to construct and critique arguments, but timely evaluation of arguments in large-scale classrooms is challenging. Recent work has shown the potential of automated scoring systems for open response assessments, leveraging machine learning (ML) and artificial intelligence (AI) to aid the scoring of written arguments in complex assessments. Moreover, research has amplified that the features (i.e., complexity, diversity, and structure) of assessment construct are critical to ML scoring accuracy, yet how the assessment construct may be associated with machine scoring accuracy remains unknown. This study investigated how the features associated with the assessment construct of a scientific argumentation assessment item affected machine scoring performance. Specifically, we conceptualized the construct in three dimensions: complexity, diversity, and structure. We employed human experts to code characteristics of the assessment tasks and score middle school student responses to 17 argumentation tasks aligned to three levels of a validated learning progression of scientific argumentation. We randomly selected 361 responses to use as training sets to build machine-learning scoring models for each item. The scoring models yielded a range of agreements with human consensus scores, measured by Cohen’s kappa (mean = 0.60; range 0.38 − 0.89), indicating good to almost perfect performance. We found that higher levels ofComplexityandDiversity of the assessment task were associated with decreased model performance, similarly the relationship between levels ofStructureand model performance showed a somewhat negative linear trend. These findings highlight the importance of considering these construct characteristics when developing ML models for scoring assessments, particularly for higher complexity items and multidimensional assessments.

     
    more » « less
  4. Abstract  
    more » « less
  5. Abstract  
    more » « less