skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: NLP‐enabled automated assessment of scientific explanations: Towards eliminating linguistic discrimination
Abstract As use of artificial intelligence (AI) has increased, concerns about AI bias and discrimination have been growing. This paper discusses an application called PyrEval in which natural language processing (NLP) was used to automate assessment and provide feedback on middle school science writing without linguistic discrimination. Linguistic discrimination in this study was operationalized as unfair assessment of scientific essays based on writing features that are not considered normative such as subject‐verb disagreement. Such unfair assessment is especially problematic when the purpose of assessment is not assessing English writing but rather assessing the content of scientific explanations. PyrEval was implemented in middle school science classrooms. Students explained their roller coaster design by stating relationships among such science concepts as potential energy, kinetic energy and law of conservation of energy. Initial and revised versions of scientific essays written by 307 eighth‐grade students were analyzed. Our manual and NLP assessment comparison analysis showed that PyrEval did not penalize student essays that contained non‐normative writing features. Repeated measures ANOVAs and GLMM analysis results revealed that essay quality significantly improved from initial to revised essays after receiving the NLP feedback, regardless of non‐normative writing features. Findings and implications are discussed. Practitioner notesWhat is already known about this topicAdvancement in AI has created a variety of opportunities in education, including automated assessment, but AI is not bias‐free.Automated writing assessment designed to improve students' scientific explanations has been studied.While limited, some studies reported biased performance of automated writing assessment tools, but without looking into actual linguistic features about which the tools may have discriminated.What this paper addsThis study conducted an actual examination of non‐normative linguistic features in essays written by middle school students to uncover how our NLP tool called PyrEval worked to assess them.PyrEval did not penalize essays containing non‐normative linguistic features.Regardless of non‐normative linguistic features, students' essay quality scores significantly improved from initial to revised essays after receiving feedback from PyrEval. Essay quality improvement was observed regardless of students' prior knowledge, school district and teacher variables.Implications for practice and/or policyThis paper inspires practitioners to attend to linguistic discrimination (re)produced by AI.This paper offers possibilities of using PyrEval as a reflection tool, to which human assessors compare their assessment and discover implicit bias against non‐normative linguistic features.PyrEval is available for use ongithub.com/psunlpgroup/PyrEvalv2.  more » « less
Award ID(s):
2010351
PAR ID:
10612460
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Wiley
Date Published:
Journal Name:
British Journal of Educational Technology
ISSN:
0007-1013
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. As use of artificial intelligence (AI) has increased, concerns about AI bias and discrimination have been growing. This paper discusses an application called PyrEval in which natural language processing (NLP) was used to automate assessment and pro- vide feedback on middle school science writing with- out linguistic discrimination. Linguistic discrimination in this study was operationalized as unfair assess- ment of scientific essays based on writing features that are not considered normative such as subject- verb disagreement. Such unfair assessment is espe- cially problematic when the purpose of assessment is not assessing English writing but rather assessing the content of scientific explanations. PyrEval was implemented in middle school science classrooms. Students explained their roller coaster design by stat- ing relationships among such science concepts as potential energy, kinetic energy and law of conser- vation of energy. Initial and revised versions of sci- entific essays written by 307 eighth- grade students were analyzed. Our manual and NLP assessment comparison analysis showed that PyrEval did not pe- nalize student essays that contained non-normative writing features. Repeated measures ANOVAs and GLMM analysis results revealed that essay quality significantly improved from initial to revised essays after receiving the NLP feedback, regardless of non- normative writing features. Findings and implications are discussed. 
    more » « less
  2. As use of artificial intelligence (AI) has increased, concerns about AI bias and discrimination have been growing. This paper discusses an application called PyrEval in which natural language processing (NLP) was used to automate assessment and pro- vide feedback on middle school science writing with- out linguistic discrimination. Linguistic discrimination in this study was operationalized as unfair assess- ment of scientific essays based on writing features that are not considered normative such as subject- verb disagreement. Such unfair assessment is espe- cially problematic when the purpose of assessment is not assessing English writing but rather assessing the content of scientific explanations. PyrEval was implemented in middle school science classrooms. Students explained their roller coaster design by stat- ing relationships among such science concepts as potential energy, kinetic energy and law of conser- vation of energy. Initial and revised versions of sci- entific essays written by 307 eighth- grade students were analyzed. Our manual and NLP assessment comparison analysis showed that PyrEval did not pe- nalize student essays that contained non-normative writing features. Repeated measures ANOVAs and GLMM analysis results revealed that essay quality significantly improved from initial to revised essays after receiving the NLP feedback, regardless of non- normative writing features. Findings and implications are discussed. 
    more » « less
  3. Hoadley, C; Wang, XC (Ed.)
    Eighth grade students received automated feedback from PyrEval - an NLP tool - about their science essays. We examined essay quality change when revised. Regardless of prior physics knowledge, essay quality improved. Grounded in literature on AI explainability and trust in automated feedback, we also examined which PyrEval explanation predicted essay quality change. Essay quality improvement was predicted by high- and medium-accuracy feedback. 
    more » « less
  4. While offering the potential to support learning interactions, emerging AI applications like Large Language Models (LLMs) come with ethical concerns. Grounding technology design in human values can address AI ethics and ensure adoption. To this end, we apply Value‐Sensitive Design—involving empirical, conceptual and technical investigations—to centre human values in the development and evaluation of LLM‐based chatbots within a high school environmental science curriculum. Representing multiple perspectives and expertise, the chatbots help students refine their causal models of climate change's impact on local marine ecosystems, communities and individuals. We first perform an empirical investigation leveraging participatory design to explore the values that motivate students and educators to engage with the chatbots. Then, we conceptualize the values that emerge from the empirical investigation by grounding them in research in ethical AI design, human values, human‐AI interactions and environmental education. Findings illuminate considerations for the chatbots to support students' identity development, well‐being, human–chatbot relationships and environmental sustainability. We further map the values onto design principles and illustrate how these principles can guide the development and evaluation of the chatbots. Our research demonstrates how to conduct contextual, value‐sensitive inquiries of emergent AI technologies in educational settings. Practitioner notesWhat is already known about this topicGenerative artificial intelligence (GenAI) technologies like Large Language Models (LLMs) can not only support learning, but also raise ethical concerns such as transparency, trust and accountability.Value‐sensitive design (VSD) presents a systematic approach to centring human values in technology design.What this paper addsWe apply VSD to design LLM‐based chatbots in environmental education and identify values central to supporting students' learning.We map the values emerging from the VSD investigations to several stages of GenAI technology development: conceptualization, development and evaluation.Implications for practice and/or policyIdentity development, well‐being, human–AI relationships and environmental sustainability are key values for designing LLM‐based chatbots in environmental education.Using educational stakeholders' values to generate design principles and evaluation metrics for learning technologies can promote technology adoption and engagement. 
    more » « less
  5. With increasing interest in computer‐assisted educa‐ tion, AI‐integrated systems become highly applicable with their ability to adapt based on user interactions. In this context, this paper focuses on understanding and analysing first‐year undergraduate student responses to an intelligent educational system that applies multi‐agent reinforcement learning as an AI tutor. With human–computer interaction at the centre, we discuss principles of interface design and educational gamification in the context of multiple years of student observations, student feedback surveys and focus group interviews. We show positive feedback from the design methodology we discuss as well as the overall process of providing automated tutoring in a gamified virtual environment. We also discuss students' thinking in the context of gamified educational systems, as well as unexpected issues that may arise when implementing such systems. Ultimately, our design iterations and analysis both offer new insights for practical implementation of computer‐assisted educational systems, focusing on how AI can augment, rather than replace, human intelligence in the classroom. Practitioner notesWhat is already known about this topicAI‐integrated systems show promise for personalizing learning and improving student education.Existing research has shown the value of personalized learner feedback.Engaged students learn more effectively.What this paper addsStudent opinions of and responses to an HCI‐based personalized educational system.New insights for practical implementation of AI‐integrated educational systems informed by years of student observations and system improvements.Qualitative insights into system design to improve human–computer interaction in educational systems.Implications for practice and/or policyActionable design principles for computer‐assisted tutoring systems derived from first‐hand student feedback and observations.Encourage new directions for human–computer interaction in educational systems. 
    more » « less