skip to main content


This content will become publicly available on March 7, 2025

Title: Integrating Natural Language Processing in Middle School Science Classrooms: An Experience Report
With the increasing prevalence of large language models (LLMs) such as ChatGPT, there is a growing need to integrate natural language processing (NLP) into K-12 education to better prepare young learners for the future AI landscape. NLP, a sub-field of AI that serves as the foundation of LLMs and many advanced AI applications, holds the potential to enrich learning in core subjects in K-12 classrooms. In this experience report, we present our efforts to integrate NLP into science classrooms with 98 middle school students across two US states, aiming to increase students’ experience and engagement with NLP models through textual data analyses and visualizations. We designed learning activities, developed an NLP-based interactive visualization platform, and facilitated classroom learning in close collaboration with middle school science teachers. This experience report aims to contribute to the growing body of work on integrating NLP into K-12 education by providing insights and practical guidelines for practitioners, researchers, and curriculum designers.  more » « less
Award ID(s):
2147810 2147811
NSF-PAR ID:
10496815
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
Proceedings of the 55th ACM Technical Symposium on Computer Science Education (SIGCSE)
Page Range / eLocation ID:
639 - 645
Format(s):
Medium: X
Location:
Portland OR USA
Sponsoring Org:
National Science Foundation
More Like this
  1. This research explores a novel human-in-the-loop approach that goes beyond traditional prompt engineering approaches to harness Large Language Models (LLMs) with chain-of-thought prompting for grading middle school students’ short answer formative assessments in science and generating useful feedback. While recent efforts have successfully applied LLMs and generative AI to automatically grade assignments in secondary classrooms, the focus has primarily been on providing scores for mathematical and programming problems with little work targeting the generation of actionable insight from the student responses. This paper addresses these limitations by exploring a human-in-the-loop approach to make the process more intuitive and more effective. By incorporating the expertise of educators, this approach seeks to bridge the gap between automated assessment and meaningful educational support in the context of science education for middle school students. We have conducted a preliminary user study, which suggests that (1) co-created models improve the performance of formative feedback generation, and (2) educator insight can be integrated at multiple steps in the process to inform what goes into the model and what comes out. Our findings suggest that in-context learning and human-in-the-loop approaches may provide a scalable approach to automated grading, where the performance of the automated LLM-based grader continually improves over time, while also providing actionable feedback that can support students’ open-ended science learning. 
    more » « less
  2. Abstract Practitioner notes

    What is already known about this topic

    Scholarly attention has turned to examining Artificial Intelligence (AI) literacy in K‐12 to help students understand the working mechanism of AI technologies and critically evaluate automated decisions made by computer models.

    While efforts have been made to engage students in understanding AI through building machine learning models with data, few of them go in‐depth into teaching and learning of feature engineering, a critical concept in modelling data.

    There is a need for research to examine students' data modelling processes, particularly in the little‐researched realm of unstructured data.

    What this paper adds

    Results show that students developed nuanced understandings of models learning patterns in data for automated decision making.

    Results demonstrate that students drew on prior experience and knowledge in creating features from unstructured data in the learning task of building text classification models.

    Students needed support in performing feature engineering practices, reasoning about noisy features and exploring features in rich social contexts that the data set is situated in.

    Implications for practice and/or policy

    It is important for schools to provide hands‐on model building experiences for students to understand and evaluate automated decisions from AI technologies.

    Students should be empowered to draw on their cultural and social backgrounds as they create models and evaluate data sources.

    To extend this work, educators should consider opportunities to integrate AI learning in other disciplinary subjects (ie, outside of computer science classes).

     
    more » « less
  3. Abstract

    Argumentation, a key scientific practice presented in theFramework for K-12 Science Education, requires students to construct and critique arguments, but timely evaluation of arguments in large-scale classrooms is challenging. Recent work has shown the potential of automated scoring systems for open response assessments, leveraging machine learning (ML) and artificial intelligence (AI) to aid the scoring of written arguments in complex assessments. Moreover, research has amplified that the features (i.e., complexity, diversity, and structure) of assessment construct are critical to ML scoring accuracy, yet how the assessment construct may be associated with machine scoring accuracy remains unknown. This study investigated how the features associated with the assessment construct of a scientific argumentation assessment item affected machine scoring performance. Specifically, we conceptualized the construct in three dimensions: complexity, diversity, and structure. We employed human experts to code characteristics of the assessment tasks and score middle school student responses to 17 argumentation tasks aligned to three levels of a validated learning progression of scientific argumentation. We randomly selected 361 responses to use as training sets to build machine-learning scoring models for each item. The scoring models yielded a range of agreements with human consensus scores, measured by Cohen’s kappa (mean = 0.60; range 0.38 − 0.89), indicating good to almost perfect performance. We found that higher levels ofComplexityandDiversity of the assessment task were associated with decreased model performance, similarly the relationship between levels ofStructureand model performance showed a somewhat negative linear trend. These findings highlight the importance of considering these construct characteristics when developing ML models for scoring assessments, particularly for higher complexity items and multidimensional assessments.

     
    more » « less
  4. null (Ed.)
    As our nation’s need for engineering professionals grows, a sharp rise in P-12 engineering education programs and related research has taken place (Brophy, Klein, Portsmore, & Rogers, 2008; Purzer, Strobel, & Cardella, 2014). The associated research has focused primarily on students’ perceptions and motivations, teachers’ beliefs and knowledge, and curricula and program success. The existing research has expanded our understanding of new K-12 engineering curriculum development and teacher professional development efforts, but empirical data remain scarce on how racial and ethnic diversity of student population influences teaching methods, course content, and overall teachers’ experiences. In particular, Hynes et al. (2017) note in their systematic review of P-12 research that little attention has been paid to teachers’ experiences with respect to racially and ethnically diverse engineering classrooms. The growing attention and resources being committed to diversity and inclusion issues (Lichtenstein, Chen, Smith, & Maldonado, 2014; McKenna, Dalal, Anderson, & Ta, 2018; NRC, 2009) underscore the importance of understanding teachers’ experiences with complementary research-based recommendations for how to implement engineering curricula in racially diverse schools to engage all students. Our work examines the experiences of three high school teachers as they teach an introductory engineering course in geographically and distinctly different racially diverse schools across the nation. The study is situated in the context of a new high school level engineering education initiative called Engineering for Us All (E4USA). The National Science Foundation (NSF) funded initiative was launched in 2018 as a partnership among five universities across the nation to ‘demystify’ engineering for high school students and teachers. The program aims to create an all-inclusive high school level engineering course(s), a professional development platform, and a learning community to support student pathways to higher education institutions. An introductory engineering course was developed and professional development was provided to nine high school teachers to instruct and assess engineering learning during the first year of the project. This study investigates participating teachers’ implementation of the course in high schools across the nation to understand the extent to which their experiences vary as a function of student demographic (race, ethnicity, socioeconomic status) and resource level of the school itself. Analysis of these experiences was undertaken using a collective case-study approach (Creswell, 2013) involving in-depth analysis of a limited number of cases “to focus on fewer "subjects," but more "variables" within each subject” (Campbell & Ahrens, 1998, p. 541). This study will document distinct experiences of high school teachers as they teach the E4USA curriculum. Participants were purposively sampled for the cases in order to gather an information-rich data set (Creswell, 2013). The study focuses on three of the nine teachers participating in the first cohort to implement the E4USA curriculum. Teachers were purposefully selected because of the demographic makeup of their students. The participating teachers teach in Arizona, Maryland and Tennessee with predominantly Hispanic, African-American, and Caucasian student bodies, respectively. To better understand similarities and differences among teaching experiences of these teachers, a rich data set is collected consisting of: 1) semi-structured interviews with teachers at multiple stages during the academic year, 2) reflective journal entries shared by the teachers, and 3) multiple observations of classrooms. The interview data will be analyzed with an inductive approach outlined by Miles, Huberman, and Saldaña (2014). All teachers’ interview transcripts will be coded together to identify common themes across participants. Participants’ reflections will be analyzed similarly, seeking to characterize their experiences. Observation notes will be used to triangulate the findings. Descriptions for each case will be written emphasizing the aspects that relate to the identified themes. Finally, we will look for commonalities and differences across cases. The results section will describe the cases at the individual participant level followed by a cross-case analysis. This study takes into consideration how high school teachers’ experiences could be an important tool to gain insight into engineering education problems at the P-12 level. Each case will provide insights into how student body diversity impacts teachers’ pedagogy and experiences. The cases illustrate “multiple truths” (Arghode, 2012) with regard to high school level engineering teaching and embody diversity from the perspective of high school teachers. We will highlight themes across cases in the context of frameworks that represent teacher experience conceptualizing race, ethnicity, and diversity of students. We will also present salient features from each case that connect to potential recommendations for advancing P-12 engineering education efforts. These findings will impact how diversity support is practiced at the high school level and will demonstrate specific novel curricular and pedagogical approaches in engineering education to advance P-12 mentoring efforts. 
    more » « less
  5. Abstract Background

    Integration of engineering into middle school science and mathematics classrooms is a key aspect of STEM integration. However, successful pedagogies for teachers to use engineering talk in their classrooms are not fully understood.

    Purpose/Hypothesis

    This study aims to address this need with the research question: How does a middle school life science teacher use engineering talk during an engineering design‐based STEM integration unit?

    Design/Method

    This case study examined the talk of a teacher whose students demonstrated high levels of learning in science and engineering throughout a three‐year professional development program. Transcripts of whole‐class verbal interactions for 18 class periods in the life science‐based STEM integration unit were analyzed using a theoretical framework based on the Framework for Quality K‐12 Engineering Education.

    Results

    The teacher used talk to integrate engineering in a variety of ways, skillfully weaving engineering throughout the unit. He framed lessons around problem scoping, incorporated engineering ideas into scientific verbal interactions, and aligned individual lessons and the overall unit with the engineering design process. He stayed true to the context of the engineering challenge and treated the students as young engineers.

    Conclusions

    This teacher's talk helped to integrate engineering with the science and mathematics content of the unit and modeled the practices of informed designers to help students learn engineering in the context of their science classroom. These findings have the potential to improve how educators and curricula developers utilize engineering teacher talk to support STEM integration.

     
    more » « less