Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
The ability to predict student performance in introductory programming courses is important to help struggling students and enhance their persistence. However, for this prediction to be impactful, it is crucial that it remains transparent and accessible for both instructors and students, ensuring effective utilization of the predicted results. Machine learning models with explainable features provide an effective means for students and instructors to comprehend students' diverse programming behaviors and problem-solving strategies, elucidating the factors contributing to both successful and suboptimal performance. This study develops an explainable model that predicts student performance based on programming assignment submission information in different stages of the course to enable early explainable predictions. We extract data-driven features from student programming submissions and utilize a stacked ensemble model for predicting final exam grades. The experimental results suggest that our model successfully predicts student performance based on their programming submissions earlier in the semester. Employing SHAP, a game-theory-based framework, we explain the model's predictions, aiding stakeholders in understanding the influence of diverse programming behaviors on students' success. Additionally, we analyze crucial features, employing a mix of descriptive statistics and mixture models to identify distinct student profiles based on their problem-solving patterns, enhancing overall explainability. Furthermore, we dive deeper and analyze the profiles using different programming patterns of the students to elucidate the characteristics of different students where SHAP explanations are not comprehensible. Our explainable early prediction model elucidates common problem-solving patterns in students relative to their expertise, facilitating effective intervention and adaptive support.more » « lessFree, publicly-accessible full text available July 12, 2026
-
Understanding student practice behavior and its connection to their learning is essential for effective recommender systems that provide personalized learning support. In this study, we apply a sequential pattern mining approach to analyze student practice behavior in a practice system for introductory Python programming. Our goal is to identify different types of practice behavior and connect them to student performance. We examine two types of practice sequences: (1) by login session and (2) by learning topic. For each sequence type, we use SPAM (Sequential PAttern Mining) to identify the most frequent micro-patterns and build behavior profiles of individual learners as vectors of micro-pattern frequencies observed in their behavior. We confirm that these vectors are stable for both sequence types (p < 0.03 for session sequences and p < 0.003 for topic sequences). Using the vectors, we perform K-means clustering where we identify two practice behaviors: example explorers and persistent finishers. We repeat this experiment using different coding approaches for student sequences and obtain similar clusters. Our results suggest that example explorers and persistent finishers might represent two typical types of divergent student behaviors in a programming practice system. Finally, to better understand the relationship between students' background knowledge, learning outcomes, and practice behavior, we perform statistical analyses to assess the significance of the associations among pre-test scores, cluster assignments, and final course grades.more » « lessFree, publicly-accessible full text available July 20, 2026
-
Mills, Caitlin; Alexandron, Giora; Taibi, Davide; Lo_Bosco, Giosuè; Paquette, Luc (Ed.)Social interactions among classroom peers, represented as social learning networks (SLNs), play a crucial role in enhancing learning outcomes. While SLN analysis has recently garnered attention, most existing approaches rely on centralized training, where data is aggregated and processed on a local/cloud server with direct access to raw data. However, in real-world educational settings, such direct access across multiple classrooms is often restricted due to privacy concerns. Furthermore, training models on isolated classroom data prevents the identification of common interaction patterns that exist across multiple classrooms, thereby limiting model performance. To address these challenges, we propose one of the first frameworks that integrates Federated Learning (FL), a distributed and collaborative machine learning (ML) paradigm, with SLNs derived from students' interactions in multiple classrooms' online forums to predict future link formations (i.e., interactions) among students. By leveraging FL, our approach enables collaborative model training across multiple classrooms while preserving data privacy, as it eliminates the need for raw data centralization. Recognizing that each classroom may exhibit unique student interaction dynamics, we further employ model personalization techniques to adapt the FL model to individual classroom characteristics. Our results demonstrate the effectiveness of our approach in capturing both shared and classroom-specific representations of student interactions in SLNs. Additionally, we utilize explainable AI (XAI) techniques to interpret model predictions, identifying key factors that influence link formation across different classrooms. These insights unveil the drivers of social learning interactions within a privacy-preserving, collaborative, and distributed ML framework—an aspect that has not been explored before.more » « lessFree, publicly-accessible full text available July 12, 2026
-
Mills, Caitlin; Alexandron, Giora; Taibi, Davide; Lo_Bosco, Giosuè; Paquette, Luc (Ed.)There is a growing community of researchers at the intersection- tion of data mining, AI, and computing education research. The objective of the CSEDM workshop is to facilitate a dis- Discussion among this research community, with a focus on how data mining can be uniquely applied in computing ed- ucation research. For example, what new techniques are needed to analyze program code and CS log data? How do results from CS education inform our analysis of this data? The workshop is meant to be an interdisciplinary event at the intersection of EDM and Computing Education Research. Researchers, faculty, and students are encouraged to share their AI- and data-driven approaches, methodological- gies, and experiences where data transforms how students learn Computer Science (CS) skills. This full-day workshop will feature paper presentations and discussions to promote collaboration.more » « lessFree, publicly-accessible full text available July 20, 2026
-
Free, publicly-accessible full text available February 18, 2026
-
The ability to predict student performance in introductory programming courses is important to help struggling students and enhance their persistence. However, for this prediction to be impactful, it is crucial that it remains transparent and accessible for both instructors and students, ensuring effective utilization of the predicted results. Machine learning models with explainable features provide an effective means for students and instructors to comprehend students' diverse programming behaviors and problem-solving strategies, elucidating the factors contributing to both successful and suboptimal performance. This study develops an explainable model that predicts student performance based on programming assignment submission information in different stages of the course to enable early explainable predictions. We extract data-driven features from student programming submissions and utilize a stacked ensemble model for predicting final exam grades. The experimental results suggest that our model successfully predicts student performance based on their programming submissions earlier in the semester. Employing SHAP, a game-theory-based framework, we explain the model's predictions, aiding stakeholders in understanding the influence of diverse programming behaviors on students' success. Additionally, we analyze crucial features, employing a mix of descriptive statistics and mixture models to identify distinct student profiles based on their problem-solving patterns, enhancing overall explainability. Furthermore, we dive deeper and analyze the profiles using different programming patterns of the students to elucidate the characteristics of different students where SHAP explanations are not comprehensible. Our explainable early prediction model elucidates common problem-solving patterns in students relative to their expertise, facilitating effective intervention and adaptive support.more » « lessFree, publicly-accessible full text available November 29, 2025
-
The majority of current research on the application of artificial intelligence (AI) and machine learning (ML) in science, technology, engineering, and mathematics (STEM) education relies on centralized model training architectures. Typically, this involves pooling data at a centralized location alongside an ML model training module, such as a cloud server. However, this approach necessitates transferring student data across the network, leading to privacy concerns. In this paper, we explore the application of federated learning (FL), a highly recognized distributed ML technique, within the educational ecosystem. We highlight the potential benefits FL offers to students, classrooms, and institutions. Also, we identify a range of technical, logistical, and ethical challenges that impede the sustainable implementation of FL in the education sector. Finally, we discuss a series of open research directions, focusing on nuanced aspects of FL implementation in educational contexts. These directions aim to explore and address the complexities of applying FL in varied educational settings, ensuring its deployment is technologically sound, beneficial, and equitable for all stakeholders involved.more » « less
-
Benjamin, Paaßen; Carrie, Demmans Epp (Ed.)There is a growing community of researchers at the intersection of data mining, AI, and computing education research. The objective of the CSEDM workshop is to facilitate a discussion among this research community, with a focus on how data mining can be uniquely applied in computing education research. For example, what new techniques are needed to analyze program code and CS log data? How do results from CS education inform our analysis of this data? The workshop is meant to be an interdisciplinary event at the intersection of EDM and Computing Education Research. Researchers, faculty, and students are encouraged to share their AI- and data-driven approaches, methodologies, and experiences where data transforms how students learn Computer Science (CS) skills. This full-day hybrid workshop will feature paper presentations and discussions to promote collaboration.more » « less
-
programming concepts in programming assignments in a CS1 course. We seek to answer the following research questions: RQ1. How effectively can large language models identify knowledge components in a CS1 course from programming assignments? RQ2. Can large language models be used to extract program-level knowledge components, and how can the information be used to identify students’ misconceptions? Preliminary results demonstrated a high similarity between course-level knowledge components retrieved from a large language model and that of an expert-generated list.more » « less
-
Identifying misconceptions in student programming solutions is an important step in evaluating their comprehension of fundamental programming concepts. While misconceptions are latent constructs that are hard to evaluate directly from student programs, logical errors can signal their existence in students’ understanding. Tracing multiple occurrences of related logical bugs over different problems can provide strong evidence of students’ misconceptions. This study presents preliminary results of utilizing an interpretable state-ofthe- art Abstract Syntax Tree-based embedding neural network to identify logical mistakes in student code. In this study, we show a proof-of-concept of the errors identified in student programs by classifying correct versus incorrect programs. Our preliminary results show that our framework is able to automatically identify misconceptions without designing and applying a detailed rubric. This approach shows promise for improving the quality of instruction in introductory programming courses by providing educators with a powerful tool that offers personalized feedback while enabling accurate modeling of student misconceptions.more » « less
An official website of the United States government
