skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on July 15, 2025

Title: Potential pitfalls of false positives
Automated writing evaluation (AWE) systems automatically assess and provide students with feedback on their writing. Despite learning benefits, students may not effectively interpret and utilize AI-generated feedback, thereby not maximizing their learning outcomes. A closely related issue is the accuracy of the systems, that students may not understand, are not perfect. Our study investigates whether students differentially addressed false positive and false negative AI-generated feedback errors on their science essays. We found that students addressed nearly all the false negative feedback; however, they addressed less than one-fourth of the false positive feedback. The odds of addressing a false positive feedback was 99% lower than addressing a false negative feedback, representing significant missed opportunities for revision and learning. We discuss the implications of these findings in the context of students’ learning.  more » « less
Award ID(s):
2010483
PAR ID:
10515210
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
International Conference on Artificial Intelligence in Education 2024
Date Published:
Journal Name:
International Conference on Artificial Intelligence in Education 2024 Proceedings
Subject(s) / Keyword(s):
AI Accuracy, Automated Feedback, Science Writing.
Format(s):
Medium: X
Location:
Recife, Brazil
Sponsoring Org:
National Science Foundation
More Like this
  1. Students are often tasked in engaging with activities where they have to learn skills that are tangential to the learning outcomes of a course, such as learning a new software. The issue is that instructors may not have the time or the expertise to help students with such tangential learning. In this paper, we explore how AI-generated feedback can provide assistance. Specifically, we study this technology in the context of a constructionist curriculum where students learn about experimental research through the creation of a gamified experiment. The AI-generated feedback gives a formative assessment on the narrative design of student-designed gamified experiments, which is important to create an engaging experience. We find that students critically engaged with the feedback, but that responses varied among students. We discuss the implications for AI-generated feedback systems for tangential learning. 
    more » « less
  2. This exploratory study focuses on the use of ChatGPT, a generative artificial intelligence (GAI) tool, by undergraduate engineering students in lab report writing in the major. Literature addressing the impact of ChatGPT and AI on student writing suggests that such technologies can both support and limit students' composing and learning processes. Acknowledging the history of writing with technologies and writing as technology, the development of GAI warrants attention to pedagogical and ethical implications in writing-intensive engineering classes. This pilot study investigates how the use of ChatGPT impacts students’ lab writing outcomes in terms of rhetorical knowledge, critical thinking and composing, knowledge of conventions, and writing processes. A group of undergraduate volunteers (n= 7) used ChatGPT to revise their original engineering lab reports written without using ChatGPT. A comparative study was conducted between original lab report samples and revisions by directly assessing students’ lab reports in gateway engineering lab courses. A focus group was conducted to learn their experiences and perspectives on ChatGPT in the context of engineering lab report writing. Implementing ChatGPT in the revision writing process could result in improving engineering students’ lab report quality due to students’ enhanced lab report genre understanding. At the same time, the use of ChatGPT also leads students to provide false claims, incorrect lab procedures, or extremely broad statements, which are not valued in the engineering lab report genre. 
    more » « less
  3. Hoadley, C; Wang, XC (Ed.)
    Helping students learn how to write is essential. However, students have few opportunities to develop this skill, since giving timely feedback is difficult for teachers. AI applications can provide quick feedback on students’ writing. But, ensuring accurate assessment can be challenging, since students’ writing quality can vary. We examined the impact of students’ writing quality on the error rate of our natural language processing (NLP) system when assessing scientific content in initial and revised design essays. We also explored whether aspects of writing quality were linked to the number of NLP errors. Despite finding that students’ revised essays were significantly different from their initial essays in a few ways, our NLP systems’ accuracy was similar. Further, our multiple regression analyses showed, overall, that students’ writing quality did not impact our NLP systems’ accuracy. This is promising in terms of ensuring students with different writing skills get similarly accurate feedback. 
    more » « less
  4. Errors in AI grading and feedback are by their nature non-deterministic and difficult to completely avoid. Since inaccurate feedback potentially harms learning, there is a need for designs and workflows that mitigate these harms. To better understand the mechanisms by which erroneous AI feedback impacts students’ learning, we conducted surveys and interviews that recorded students’ interactions with a short-answer AI autograder for ``Explain in Plain English'' code reading problems. Using causal modeling, we inferred the learning impacts of wrong answers marked as right (false positives, FPs) and right answers marked as wrong (false negatives, FNs). We further explored explanations for the learning impacts, including errors influencing participants’ engagement with feedback and assessments of their answers’ correctness, and participants’ prior performance in the class. FPs harmed learning in large part due to participants’ failures to detect the errors. This was due to participants not paying attention to the feedback after being marked as right, and an apparent bias against admitting one’s answer was wrong once marked right. On the other hand, FNs harmed learning only for survey participants, suggesting that interviewees’ greater behavioral and cognitive engagement protected them from learning harms. Based on these findings, we propose ways to help learners detect FPs and encourage deeper reflection on FNs to mitigate learning harms of AI errors. 
    more » « less
  5. AbstractRecent advances in generative artificial intelligence (AI) and multimodal learning analytics (MMLA) have allowed for new and creative ways of leveraging AI to support K12 students' collaborative learning in STEM+C domains. To date, there is little evidence of AI methods supporting students' collaboration in complex, open‐ended environments. AI systems are known to underperform humans in (1) interpreting students' emotions in learning contexts, (2) grasping the nuances of social interactions and (3) understanding domain‐specific information that was not well‐represented in the training data. As such, combined human and AI (ie, hybrid) approaches are needed to overcome the current limitations of AI systems. In this paper, we take a first step towards investigating how a human‐AI collaboration between teachers and researchers using an AI‐generated multimodal timeline can guide and support teachers' feedback while addressing students' STEM+C difficulties as they work collaboratively to build computational models and solve problems. In doing so, we present a framework characterizing the human component of our human‐AI partnership as a collaboration between teachers and researchers. To evaluate our approach, we present our timeline to a high school teacher and discuss the key insights gleaned from our discussions. Our case study analysis reveals the effectiveness of an iterative approach to using human‐AI collaboration to address students' STEM+C challenges: the teacher can use the AI‐generated timeline to guide formative feedback for students, and the researchers can leverage the teacher's feedback to help improve the multimodal timeline. Additionally, we characterize our findings with respect to two events of interest to the teacher: (1) when the students cross adifficulty threshold,and (2) thepoint of intervention, that is, when the teacher (or system) should intervene to provide effective feedback. It is important to note that the teacher explained that there should be a lag between (1) and (2) to give students a chance to resolve their own difficulties. Typically, such a lag is not implemented in computer‐based learning environments that provide feedback. Practitioner notesWhat is already known about this topicCollaborative, open‐ended learning environments enhance students' STEM+C conceptual understanding and practice, but they introduce additional complexities when students learn concepts spanning multiple domains.Recent advances in generative AI and MMLA allow for integrating multiple datastreams to derive holistic views of students' states, which can support more informed feedback mechanisms to address students' difficulties in complex STEM+C environments.Hybrid human‐AI approaches can help address collaborating students' STEM+C difficulties by combining the domain knowledge, emotional intelligence and social awareness of human experts with the general knowledge and efficiency of AI.What this paper addsWe extend a previous human‐AI collaboration framework using a hybrid intelligence approach to characterize the human component of the partnership as a researcher‐teacher partnership and present our approach as a teacher‐researcher‐AI collaboration.We adapt an AI‐generated multimodal timeline to actualize our human‐AI collaboration by pairing the timeline with videos of students encountering difficulties, engaging in active discussions with a high school teacher while watching the videos to discern the timeline's utility in the classroom.From our discussions with the teacher, we define two types ofinflection pointsto address students' STEM+C difficulties—thedifficulty thresholdand theintervention point—and discuss how thefeedback latency intervalseparating them can inform educator interventions.We discuss two ways in which our teacher‐researcher‐AI collaboration can help teachers support students encountering STEM+C difficulties: (1) teachers using the multimodal timeline to guide feedback for students, and (2) researchers using teachers' input to iteratively refine the multimodal timeline.Implications for practice and/or policyOur case study suggests that timeline gaps (ie, disengaged behaviour identified by off‐screen students, pauses in discourse and lulls in environment actions) are particularly important for identifying inflection points and formulating formative feedback.Human‐AI collaboration exists on a dynamic spectrum and requires varying degrees of human control and AI automation depending on the context of the learning task and students' work in the environment.Our analysis of this human‐AI collaboration using a multimodal timeline can be extended in the future to support students and teachers in additional ways, for example, designing pedagogical agents that interact directly with students, developing intervention and reflection tools for teachers, helping teachers craft daily lesson plans and aiding teachers and administrators in designing curricula. 
    more » « less