Abstract As use of artificial intelligence (AI) has increased, concerns about AI bias and discrimination have been growing. This paper discusses an application called PyrEval in which natural language processing (NLP) was used to automate assessment and provide feedback on middle school science writing without linguistic discrimination. Linguistic discrimination in this study was operationalized as unfair assessment of scientific essays based on writing features that are not considered normative such as subject‐verb disagreement. Such unfair assessment is especially problematic when the purpose of assessment is not assessing English writing but rather assessing the content of scientific explanations. PyrEval was implemented in middle school science classrooms. Students explained their roller coaster design by stating relationships among such science concepts as potential energy, kinetic energy and law of conservation of energy. Initial and revised versions of scientific essays written by 307 eighth‐grade students were analyzed. Our manual and NLP assessment comparison analysis showed that PyrEval did not penalize student essays that contained non‐normative writing features. Repeated measures ANOVAs and GLMM analysis results revealed that essay quality significantly improved from initial to revised essays after receiving the NLP feedback, regardless of non‐normative writing features. Findings and implications are discussed. Practitioner notesWhat is already known about this topicAdvancement in AI has created a variety of opportunities in education, including automated assessment, but AI is not bias‐free.Automated writing assessment designed to improve students' scientific explanations has been studied.While limited, some studies reported biased performance of automated writing assessment tools, but without looking into actual linguistic features about which the tools may have discriminated.What this paper addsThis study conducted an actual examination of non‐normative linguistic features in essays written by middle school students to uncover how our NLP tool called PyrEval worked to assess them.PyrEval did not penalize essays containing non‐normative linguistic features.Regardless of non‐normative linguistic features, students' essay quality scores significantly improved from initial to revised essays after receiving feedback from PyrEval. Essay quality improvement was observed regardless of students' prior knowledge, school district and teacher variables.Implications for practice and/or policyThis paper inspires practitioners to attend to linguistic discrimination (re)produced by AI.This paper offers possibilities of using PyrEval as a reflection tool, to which human assessors compare their assessment and discover implicit bias against non‐normative linguistic features.PyrEval is available for use ongithub.com/psunlpgroup/PyrEvalv2.
more »
« less
Testing the Ability of Teachers and Students to Differentiate between Essays Generated by ChatGPT and High School Students
The release of ChatGPT in late 2022 prompted widespread concern about the implications of artificial intelligence for academic integrity, but thus far there has been little direct empirical evidence to inform this debate. Participants (69 high school teachers, 140 high school students, total N = 209) took an AI Identification Test in which they read pairs of essays—one written by a high school student and the other by ChatGPT—and guessed which was generated by the chatbot. Accuracy was only 70% for teachers, and it was slightly worse for students (62%). Self-reported confidence did not predict accuracy, nor did experience with ChatGPT or subject-matter expertise. Well-written student essays were especially hard to differentiate from the ChatGPT texts. In another set of measures, students reported greater optimism than their teachers did about the future role of ChatGPT in education. Students expressed disapproval of submitting ChatGPT-generated essays as one’s own but rated this and other possible academic integrity violations involving ChatGPT less negatively than teachers did. These results form an empirical basis for further work on the relationship between AI and academic integrity.
more »
« less
- Award ID(s):
- 2104610
- PAR ID:
- 10493551
- Editor(s):
- Yan, Zheng
- Publisher / Repository:
- Human Behavior and Emerging Technologies
- Date Published:
- Journal Name:
- Human Behavior and Emerging Technologies
- Volume:
- 2023
- ISSN:
- 2578-1863
- Page Range / eLocation ID:
- 1923981
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
As use of artificial intelligence (AI) has increased, concerns about AI bias and discrimination have been growing. This paper discusses an application called PyrEval in which natural language processing (NLP) was used to automate assessment and pro- vide feedback on middle school science writing with- out linguistic discrimination. Linguistic discrimination in this study was operationalized as unfair assess- ment of scientific essays based on writing features that are not considered normative such as subject- verb disagreement. Such unfair assessment is espe- cially problematic when the purpose of assessment is not assessing English writing but rather assessing the content of scientific explanations. PyrEval was implemented in middle school science classrooms. Students explained their roller coaster design by stat- ing relationships among such science concepts as potential energy, kinetic energy and law of conser- vation of energy. Initial and revised versions of sci- entific essays written by 307 eighth- grade students were analyzed. Our manual and NLP assessment comparison analysis showed that PyrEval did not pe- nalize student essays that contained non-normative writing features. Repeated measures ANOVAs and GLMM analysis results revealed that essay quality significantly improved from initial to revised essays after receiving the NLP feedback, regardless of non- normative writing features. Findings and implications are discussed.more » « less
-
As use of artificial intelligence (AI) has increased, concerns about AI bias and discrimination have been growing. This paper discusses an application called PyrEval in which natural language processing (NLP) was used to automate assessment and pro- vide feedback on middle school science writing with- out linguistic discrimination. Linguistic discrimination in this study was operationalized as unfair assess- ment of scientific essays based on writing features that are not considered normative such as subject- verb disagreement. Such unfair assessment is espe- cially problematic when the purpose of assessment is not assessing English writing but rather assessing the content of scientific explanations. PyrEval was implemented in middle school science classrooms. Students explained their roller coaster design by stat- ing relationships among such science concepts as potential energy, kinetic energy and law of conser- vation of energy. Initial and revised versions of sci- entific essays written by 307 eighth- grade students were analyzed. Our manual and NLP assessment comparison analysis showed that PyrEval did not pe- nalize student essays that contained non-normative writing features. Repeated measures ANOVAs and GLMM analysis results revealed that essay quality significantly improved from initial to revised essays after receiving the NLP feedback, regardless of non- normative writing features. Findings and implications are discussed.more » « less
-
The emergence of ChatGPT, an AI-powered language model, has sparked numerous debates and discussions. In educational research, scholars have raised significant questions regarding the potential, limitations, and ethical concerns around the use of this technology. While research on the application and implications of ChatGPT in academic settings exists, analysis of the perspectives of high-school students are limited. In this study, we use qualitative content analysis to explore the perspectives of high-school students regarding the integration or ban of ChatGPT in their schools through the lens of the Technology Acceptance Model (TAM2). Data was sourced from students’ comments to a New York Times Learning Network article. Findings revealed that students' perceptions about integrating or banning ChatGPT in schools are influenced by their assessments of the technology’s usefulness, personal experiences, societal technology trends, and ethical considerations. Our findings suggest that student perspectives in this study align with those of educators and policymakers while also possessing unique perspectives that cater to their specific needs and experiences. Implications emphasize the significance of an inclusive decision-making process around the integration of AI schools in educational contexts, including students alongside other stakeholders.more » « less
-
Hoadley, C; Wang, XC (Ed.)Helping students learn how to write is essential. However, students have few opportunities to develop this skill, since giving timely feedback is difficult for teachers. AI applications can provide quick feedback on students’ writing. But, ensuring accurate assessment can be challenging, since students’ writing quality can vary. We examined the impact of students’ writing quality on the error rate of our natural language processing (NLP) system when assessing scientific content in initial and revised design essays. We also explored whether aspects of writing quality were linked to the number of NLP errors. Despite finding that students’ revised essays were significantly different from their initial essays in a few ways, our NLP systems’ accuracy was similar. Further, our multiple regression analyses showed, overall, that students’ writing quality did not impact our NLP systems’ accuracy. This is promising in terms of ensuring students with different writing skills get similarly accurate feedback.more » « less
An official website of the United States government

