Argumentation is fundamental to science education, both as a prominent feature of scientific reasoning and as an effective mode of learning—a perspective reflected in contemporary frameworks and standards. The successful implementation of argumentation in school science, however, requires a paradigm shift in science assessment from the measurement of knowledge and understanding to the measurement of performance and knowledge in use. Performance tasks requiring argumentation must capture the many ways students can construct and evaluate arguments in science, yet such tasks are both expensive and resource‐intensive to score. In this study we explore how machine learning text classification techniques can be applied to develop efficient, valid, and accurate constructed‐response measures of students' competency with written scientific argumentation that are aligned with a validated argumentation learning progression. Data come from 933 middle school students in the San Francisco Bay Area and are based on three sets of argumentation items in three different science contexts. The findings demonstrate that we have been able to develop computer scoring models that can achieve substantial to almost perfect agreement between human‐assigned and computer‐predicted scores. Model performance was slightly weaker for harder items targeting higher levels of the learning progression, largely due to the linguistic complexity of these responses and the sparsity of higher‐level responses in the training data set. Comparing the efficacy of different scoring approaches revealed that breaking down students' arguments into multiple components (e.g., the presence of an accurate claim or providing sufficient evidence), developing computer models for each component, and combining scores from these analytic components into a holistic score produced better results than holistic scoring approaches. However, this analytical approach was found to be differentially biased when scoring responses from English learners (EL) students as compared to responses from non‐EL students on some items. Differences in the severity between human and computer scores for EL between these approaches are explored, and potential sources of bias in automated scoring are discussed.
This paper describes HASbot, an automated text scoring and real‐time feedback system designed to support student revision of scientific arguments. Students submit open‐ended text responses to explain how their data support claims and how the limitations of their data affect the uncertainty of their explanations. HASbot automatically scores these text responses and returns the scores with feedback to students. Data were collected from 343 middle‐ and high‐school students taught by nine teachers across seven states in the United States. A mixed methods design was applied to investigate (a) how students’ utilization of HASbot impacted their development of uncertainty‐infused scientific arguments; (b) how students used feedback to revise their arguments, and (c) how the current design of HASbot supported or hindered students’ revisions. Paired sample
- NSF-PAR ID:
- 10088378
- Publisher / Repository:
- Wiley Blackwell (John Wiley & Sons)
- Date Published:
- Journal Name:
- Science Education
- Volume:
- 103
- Issue:
- 3
- ISSN:
- 0036-8326
- Page Range / eLocation ID:
- p. 590-622
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Abstract -
Abstract Flourishing in today's global society requires citizens that are both intelligent consumers and producers of scientific understanding. Indeed, the modern world is facing ever‐more complex problems that require innovative ways of thinking about, around, and with science. As numerous educational stakeholders have suggested, such skills and abilities are not innate and must, therefore, be taught (e.g., McNeill & Krajcik,
Journal of Research in Science Teaching ,45 (1), 53–78. 2008). However, such instruction requires a fundamental shift in science pedagogy so as to foster knowledge and practices like deep, conceptual understanding, model‐based reasoning, and oral and written argumentation where scientific evidence is evaluated (National Research Council,Next Generation Science Standards: For States, by States , Washington, DC: The National Academies Press, 2013). The purpose of our quasi‐experimental study was to examine the effectiveness of Quality Talk Science, a professional development model and intervention, in fostering changes in teachers’ and students’ discourse practices as well as their conceptual understanding and scientific argumentation. Findings revealed treatment teachers’ and students’ discourse practices better reflected critical‐analytic thinking and argumentation at posttest relative to comparison classrooms. Similarly, at posttest treatment students produced stronger written scientific arguments than comparison students. Students’ growth in conceptual understanding was nonsignificant. These findings suggest discourse interventions such as Quality Talk Science can improve high‐school students’ ability to engage in scientific argumentation. -
Abstract Argumentation, a key scientific practice presented in the
Framework for K-12 Science Education , requires students to construct and critique arguments, but timely evaluation of arguments in large-scale classrooms is challenging. Recent work has shown the potential of automated scoring systems for open response assessments, leveraging machine learning (ML) and artificial intelligence (AI) to aid the scoring of written arguments in complex assessments. Moreover, research has amplified that the features (i.e., complexity, diversity, and structure) of assessment construct are critical to ML scoring accuracy, yet how the assessment construct may be associated with machine scoring accuracy remains unknown. This study investigated how the features associated with the assessment construct of a scientific argumentation assessment item affected machine scoring performance. Specifically, we conceptualized the construct in three dimensions: complexity, diversity, and structure. We employed human experts to code characteristics of the assessment tasks and score middle school student responses to 17 argumentation tasks aligned to three levels of a validated learning progression of scientific argumentation. We randomly selected 361 responses to use as training sets to build machine-learning scoring models for each item. The scoring models yielded a range of agreements with human consensus scores, measured by Cohen’s kappa (mean = 0.60; range 0.38 − 0.89), indicating good to almost perfect performance. We found that higher levels ofComplexity andDiversity of the assessment task were associated with decreased model performance, similarly the relationship between levels ofStructure and model performance showed a somewhat negative linear trend. These findings highlight the importance of considering these construct characteristics when developing ML models for scoring assessments, particularly for higher complexity items and multidimensional assessments. -
Abstract Artificial intelligence (AI) can enhance teachers' capabilities by sharing control over different parts of learning activities. This is especially true for complex learning activities, such as dynamic learning transitions where students move between individual and collaborative learning in un‐planned ways, as the need arises. Yet, few initiatives have emerged considering how shared responsibility between teachers and AI can support learning and how teachers' voices might be included to inform design decisions. The goal of our article is twofold. First, we describe a secondary analysis of our co‐design process comprising six design methods to understand how teachers conceptualise sharing control with an AI co‐orchestration tool, called
Pair‐Up . We worked with 76 middle school math teachers, each taking part in one to three methods, to create a co‐orchestration tool that supports dynamic combinations of individual and collaborative learning using two AI‐based tutoring systems. We leveraged qualitative content analysis to examine teachers' views about sharing control withPair‐Up , and we describe high‐level insights about the human‐AI interaction, including control, trust, responsibility, efficiency, and accuracy. Secondly, we use our results as an example showcasing how human‐centred learning analytics can be applied to the design of human‐AI technologies and share reflections for human‐AI technology designers regarding the methods that might be fruitful to elicit teacher feedback and ideas. Our findings illustrate the design of a novel co‐orchestration tool to facilitate the transitions between individual and collaborative learning and highlight considerations and reflections for designers of similar systems.Practitioner notes What is already known about this topic:
Artificial Intelligence (AI) can help teachers facilitate complex classroom activities, such as having students move between individual and collaborative learning in unplanned ways.
Designers should use human‐centred design approaches to give teachers a voice in deciding what AI might do in the classroom and if or how they want to share control with it.
What this paper adds:
Presents teacher views about how they want to share control with AI to support students moving between individual and collaborative learning.
Describes how we adapted six design methods to design AI features.
Illustrates a complete, iterative process to create human‐AI interactions to support teachers as they facilitate students moving from individual to collaborative learning.
Implications for practice:
We share five implications for designers that teachers highlighted as necessary when designing AI‐features, including control, trust, responsibility, efficiency and accuracy.
Our work also includes a reflection on our design process and implications for future design processes.
-
Abstract Natural language helps express mathematical thinking and contexts. Conventional mathematical notation (CMN) best suits expressions and equations. Each is essential; each also has limitations, especially for learners. Our research studies how programming can be a advantageous third language that can also help restore mathematical connections that are hidden by topic‐centred curricula. Restoring opportunities for surprise and delight reclaims mathematics' creative nature. Studies of children's use of language in mathematics and their programming behaviours guide our iterative design/redesign of mathematical microworlds in which students, ages 7–11, use programming in their regular school lessons
as a language for learning mathematics . Though driven by mathematics, not coding, the microworlds develop the programming over time so that it continues to support children's developing mathematical ideas. This paper briefly describes microworlds EDC has tested with well over 400 7‐to‐8‐year‐olds in school, and others tested (or about to be tested) with over 200 8‐to‐11‐year‐olds. Our challenge was to satisfy schools' topical orientation and fit easily within regular classroom study but use and foreshadow other mathematical learning to remove the siloes. The design/redesign research and evaluation is exploratory, without formal methodology. We are also more formally studying effects on children's learning. That ongoing study is not reported here.Practitioner notes What is already known
Active learning—
doing —supports learning.Collaborative learning—doing
together —supports learning.Classroom discourse—focused, relevant
discussion , not just listening—supports learning.Clear articulation of one's thinking, even just to oneself, helps develop that thinking.
What this paper adds
The common languages we use for classroom mathematics—natural language for conveying the meaning and context of mathematical situations and for explaining our reasoning; and the formal (written) language of conventional mathematical notation, the symbols we use in mathematical expressions and equations—are both essential but each presents hurdles that necessitate the other. Yet, even together, they are insufficient especially for young learners.
Programming, appropriately designed and used, can be the third language that both reduces barriers and provides the missing expressive and creative capabilities children need.
Appropriate design for use in regular mathematics classrooms requires making key mathematical content obvious, strong and the ‘driver’ of the activities, and requires reducing tech ‘overhead’ to near zero.
Continued usefulness across the grades requires developing children's sophistication and knowledge with the language; the powerful ways that children rapidly acquire facility with (natural) language provides guidance for ways they can learn a formal language as well.
Implications for policy and/or practice
Mathematics teaching can take advantage of the ways children learn through experimentation and attention to the results, and of the ways children use their language brain even for mathematics.
In particular, programming—in microworlds driven by the mathematical content, designed to minimise distraction and overhead, open to exploration and discovery
en route to focused aims, and in which childrenself ‐evaluate—can allow clear articulation of thought, experimentation with immediate feedback.As it aids the mathematics, it also builds computational thinking and satisfies schools' increasing concerns to broaden access to ideas of computer science.