NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Explaining Explainability: Early Performance Prediction with Student Programming Pattern Profiling

https://doi.org/10.5281/zenodo.14246435

Hoq, Muntasir; Brusilovsky, Peter; Akram, Bita (July 2025, Educational Data Mining)

The ability to predict student performance in introductory programming courses is important to help struggling students and enhance their persistence. However, for this prediction to be impactful, it is crucial that it remains transparent and accessible for both instructors and students, ensuring effective utilization of the predicted results. Machine learning models with explainable features provide an effective means for students and instructors to comprehend students' diverse programming behaviors and problem-solving strategies, elucidating the factors contributing to both successful and suboptimal performance. This study develops an explainable model that predicts student performance based on programming assignment submission information in different stages of the course to enable early explainable predictions. We extract data-driven features from student programming submissions and utilize a stacked ensemble model for predicting final exam grades. The experimental results suggest that our model successfully predicts student performance based on their programming submissions earlier in the semester. Employing SHAP, a game-theory-based framework, we explain the model's predictions, aiding stakeholders in understanding the influence of diverse programming behaviors on students' success. Additionally, we analyze crucial features, employing a mix of descriptive statistics and mixture models to identify distinct student profiles based on their problem-solving patterns, enhancing overall explainability. Furthermore, we dive deeper and analyze the profiles using different programming patterns of the students to elucidate the characteristics of different students where SHAP explanations are not comprehensible. Our explainable early prediction model elucidates common problem-solving patterns in students relative to their expertise, facilitating effective intervention and adaptive support.
more » « less
Free, publicly-accessible full text available July 12, 2026
Can two emails improve students’ persistence in computing? Evaluating the effects of a lightweight, scalable self-assessment intervention on career-relevant attitudes and behaviors

https://doi.org/10.1080/08993408.2025.2551460

Fisk, Susan; Hudson, Tara; Bryans-Dobar, Emily; Hunt, Cynthia; Dress, Courtney; Akram, Bita; Price, Thomas; Battestilli, Lina; Barnes, Tiffany (September 2025, Computer Science Education)

Free, publicly-accessible full text available September 12, 2026
Example Explorers and Persistent Finishers: Exploring Student Practice Behaviors in a Python Practice System

Poh, Alison; Hridi, Anurata; Barria-Pineda, Jordan; Brusilovsky, Peter; Akram, Bita (July 2025, Proceedings of 9th Educational Data Mining in Computer Science Education (CSEDM) Workshop at EDM 2025)

Understanding student practice behavior and its connection to their learning is essential for effective recommender systems that provide personalized learning support. In this study, we apply a sequential pattern mining approach to analyze student practice behavior in a practice system for introductory Python programming. Our goal is to identify different types of practice behavior and connect them to student performance. We examine two types of practice sequences: (1) by login session and (2) by learning topic. For each sequence type, we use SPAM (Sequential PAttern Mining) to identify the most frequent micro-patterns and build behavior profiles of individual learners as vectors of micro-pattern frequencies observed in their behavior. We confirm that these vectors are stable for both sequence types (p < 0.03 for session sequences and p < 0.003 for topic sequences). Using the vectors, we perform K-means clustering where we identify two practice behaviors: example explorers and persistent finishers. We repeat this experiment using different coding approaches for student sequences and obtain similar clusters. Our results suggest that example explorers and persistent finishers might represent two typical types of divergent student behaviors in a programming practice system. Finally, to better understand the relationship between students' background knowledge, learning outcomes, and practice behavior, we perform statistical analyses to assess the significance of the associations among pre-test scores, cluster assignments, and final course grades.
more » « less
Free, publicly-accessible full text available July 20, 2026
Privacy-Preserving Distributed Link Predictions Among Peers in Online Classrooms Using Federated Learning

https://doi.org/10.5281/zenodo.15870178

Hridi, Anurata; Hoq, Muntasir; Gao, Zhikai; Lynch, Collin; Sahay, Rajeev; Hosseinalipour, Seyyedali; Akram, Bita (July 2025, International Educational Data Mining Society)
Mills, Caitlin; Alexandron, Giora; Taibi, Davide; Lo_Bosco, Giosuè; Paquette, Luc (Ed.)
Social interactions among classroom peers, represented as social learning networks (SLNs), play a crucial role in enhancing learning outcomes. While SLN analysis has recently garnered attention, most existing approaches rely on centralized training, where data is aggregated and processed on a local/cloud server with direct access to raw data. However, in real-world educational settings, such direct access across multiple classrooms is often restricted due to privacy concerns. Furthermore, training models on isolated classroom data prevents the identification of common interaction patterns that exist across multiple classrooms, thereby limiting model performance. To address these challenges, we propose one of the first frameworks that integrates Federated Learning (FL), a distributed and collaborative machine learning (ML) paradigm, with SLNs derived from students' interactions in multiple classrooms' online forums to predict future link formations (i.e., interactions) among students. By leveraging FL, our approach enables collaborative model training across multiple classrooms while preserving data privacy, as it eliminates the need for raw data centralization. Recognizing that each classroom may exhibit unique student interaction dynamics, we further employ model personalization techniques to adapt the FL model to individual classroom characteristics. Our results demonstrate the effectiveness of our approach in capturing both shared and classroom-specific representations of student interactions in SLNs. Additionally, we utilize explainable AI (XAI) techniques to interpret model predictions, identifying key factors that influence link formation across different classrooms. These insights unveil the drivers of social learning interactions within a privacy-preserving, collaborative, and distributed ML framework—an aspect that has not been explored before.
more » « less
Free, publicly-accessible full text available July 12, 2026
9th Educational Data Mining in Computer Science Education (CSEDM) Workshop

https://doi.org/10.5281/zenodo.15870308

Akram, Bita; Shi, Yang; Brusilovsky, Peter; Price, Thomas; Koedinger, Ken; Carvalho, Paulo; Zhang, Shan; Lan, Andrew; Leinonen, Juho (July 2025, Proceedings of 18th International Conference on Educational Data Mining (EDM 2025), International Educational Data Mining Society)
Mills, Caitlin; Alexandron, Giora; Taibi, Davide; Lo_Bosco, Giosuè; Paquette, Luc (Ed.)
There is a growing community of researchers at the intersection- tion of data mining, AI, and computing education research. The objective of the CSEDM workshop is to facilitate a dis- Discussion among this research community, with a focus on how data mining can be uniquely applied in computing ed- ucation research. For example, what new techniques are needed to analyze program code and CS log data? How do results from CS education inform our analysis of this data? The workshop is meant to be an interdisciplinary event at the intersection of EDM and Computing Education Research. Researchers, faculty, and students are encouraged to share their AI- and data-driven approaches, methodological- gies, and experiences where data transforms how students learn Computer Science (CS) skills. This full-day workshop will feature paper presentations and discussions to promote collaboration.
more » « less
Free, publicly-accessible full text available July 20, 2026
LLM-KCI: Leveraging Large Language Models to Identify Programming Knowledge Components

https://doi.org/10.1145/3641555.3705215

Niousha, Rose; O'Neill, Abigail; Chen, Ethan; Malhotra, Vedansh; Akram, Bita; Norouzi, Narges (February 2025, ACM)

Free, publicly-accessible full text available February 18, 2026
Explaining Explainability: Early Performance Prediction with Student Programming Pattern Profiling

Hoq, Muntasir; Brusilovsky, Peter; Akram, Bita (November 2024, Journal of Educational Data Mining)

The ability to predict student performance in introductory programming courses is important to help struggling students and enhance their persistence. However, for this prediction to be impactful, it is crucial that it remains transparent and accessible for both instructors and students, ensuring effective utilization of the predicted results. Machine learning models with explainable features provide an effective means for students and instructors to comprehend students' diverse programming behaviors and problem-solving strategies, elucidating the factors contributing to both successful and suboptimal performance. This study develops an explainable model that predicts student performance based on programming assignment submission information in different stages of the course to enable early explainable predictions. We extract data-driven features from student programming submissions and utilize a stacked ensemble model for predicting final exam grades. The experimental results suggest that our model successfully predicts student performance based on their programming submissions earlier in the semester. Employing SHAP, a game-theory-based framework, we explain the model's predictions, aiding stakeholders in understanding the influence of diverse programming behaviors on students' success. Additionally, we analyze crucial features, employing a mix of descriptive statistics and mixture models to identify distinct student profiles based on their problem-solving patterns, enhancing overall explainability. Furthermore, we dive deeper and analyze the profiles using different programming patterns of the students to elucidate the characteristics of different students where SHAP explanations are not comprehensible. Our explainable early prediction model elucidates common problem-solving patterns in students relative to their expertise, facilitating effective intervention and adaptive support.
more » « less
Full Text Available
Revolutionizing AI-Assisted Education with Federated Learning: A Pathway to Distributed, Privacy-Preserving, and Debiased Learning Ecosystems

https://doi.org/10.1609/aaaiss.v3i1.31217

Hridi, Anurata Prabha; Sahay, Rajeev; Hosseinalipour, Seyyedali; Akram, Bita (May 2024, Proceedings of the AAAI Symposium Series)

The majority of current research on the application of artificial intelligence (AI) and machine learning (ML) in science, technology, engineering, and mathematics (STEM) education relies on centralized model training architectures. Typically, this involves pooling data at a centralized location alongside an ML model training module, such as a cloud server. However, this approach necessitates transferring student data across the network, leading to privacy concerns. In this paper, we explore the application of federated learning (FL), a highly recognized distributed ML technique, within the educational ecosystem. We highlight the potential benefits FL offers to students, classrooms, and institutions. Also, we identify a range of technical, logistical, and ethical challenges that impede the sustainable implementation of FL in the education sector. Finally, we discuss a series of open research directions, focusing on nuanced aspects of FL implementation in educational contexts. These directions aim to explore and address the complexities of applying FL in varied educational settings, ensuring its deployment is technologically sound, beneficial, and equitable for all stakeholders involved.
more » « less
Full Text Available
8th Educational Data Mining in Computer Science Education (CSEDM) Workshop

https://doi.org/10.5281/zenodo.12730035

Shi, Yang; Brusilovsky, Peter; Akram, Bita; Price, Thomas W; Leinonen, Juho; Koedinger, Kenneth R; Lan, Andrew (July 2024, Proceedings of 17th International Conference on Educational Data Mining (EDM 2024), International Educational Data Mining Society)
Benjamin, Paaßen; Carrie, Demmans Epp (Ed.)
There is a growing community of researchers at the intersection of data mining, AI, and computing education research. The objective of the CSEDM workshop is to facilitate a discussion among this research community, with a focus on how data mining can be uniquely applied in computing education research. For example, what new techniques are needed to analyze program code and CS log data? How do results from CS education inform our analysis of this data? The workshop is meant to be an interdisciplinary event at the intersection of EDM and Computing Education Research. Researchers, faculty, and students are encouraged to share their AI- and data-driven approaches, methodologies, and experiences where data transforms how students learn Computer Science (CS) skills. This full-day hybrid workshop will feature paper presentations and discussions to promote collaboration.
more » « less
Full Text Available
Use of Large Language Models for Extracting Knowledge Components in CS1 Programming Exercises

https://doi.org/10.1145/3626253.3635592

Niousha, Rose; Hoq, Muntasir; Akram, Bita; Norouzi, Narges (March 2024, Special Interest Group on Computer Science Education bulletin)

programming concepts in programming assignments in a CS1 course. We seek to answer the following research questions: RQ1. How effectively can large language models identify knowledge components in a CS1 course from programming assignments? RQ2. Can large language models be used to extract program-level knowledge components, and how can the information be used to identify students’ misconceptions? Preliminary results demonstrated a high similarity between course-level knowledge components retrieved from a large language model and that of an expert-generated list.
more » « less
Full Text Available

« Prev Next »

Search for: All records