null
(Ed.)
Models for automated scoring of content in educational applications continue to demonstrate improvements in human-machine agreement, but it remains to be demonstrated that the models achieve gains for the “right” reasons. For providing reliable scoring and feedback, both high accuracy and construct coverage are crucial. In this work, we provide an in-depth quantitative and qualitative analysis of automated scoring models for science explanations of middle school students in an online learning environment that leverages saliency maps to explore the reasons for individual model score predictions. Our analysis reveals that top-performing models can arrive at the same predictions for very different reasons, and that current model architectures have difficulty detecting ideas in student responses beyond keywords.
more »
« less
An official website of the United States government

