skip to main content


Title: Starters and Finishers: Predicting Next Assignment Completion From Student Behavior During Math Problem Solving
A substantial amount of research has been conducted by the educational data mining community to track and model learning. Previous work in modeling student knowledge has focused on predicting student performance at the problem level. While informative, problem-to-problem predictions leave little time for interventions within the system and relatively no time for human interventions. As such, modeling student performance at higher levels, such as by assignment, may provide a better opportunity to develop and apply learning interventions preemptively to remedy gaps in student knowledge. We aim to identify assignment-level features that predict whether or a not a student will finish their next homework assignment once started. We employ logistic regression models to test which features best predict whether a student will be a “starter” or a “finisher” on the next assignment.  more » « less
Award ID(s):
1724889
NSF-PAR ID:
10095367
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Proceedings of the Eleventh International Conference on Educational Data Mining
Page Range / eLocation ID:
525-528
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Student procrastination and cramming for deadlines are major challenges in online learning environments, with negative educational and well-being side effects. Modeling student activities in continuous time and predicting their next study time are important problems that can help in creating personalized timely interventions to mitigate these challenges. However, previous attempts on dynamic modeling of student procrastination suffer from major issues: they are unable to predict the next activity times, cannot deal with missing activity history, are not personalized, and disregard important course properties, such as assignment deadlines, that are essential in explaining the cramming behavior. To resolve these problems, we introduce a new personalized stimuli-sensitive Hawkes process model (SSHP), by jointly modeling all student-assignment pairs and utilizing their similarities, to predict students’ next activity times even when there are no historical observations. Unlike regular point processes that assume a constant external triggering effect from the environment, we model three dynamic types of external stimuli, according to assignment availabilities, assignment deadlines, and each student’s time management habits. Our experiments on two synthetic datasets and two real-world datasets show a superior performance of future activity prediction, comparing with state-of-the-art models. Moreover, we show that our model achieves a flexible and accurate parameterization of activity intensities in students. 
    more » « less
  2. The 2021 return to face-to-face teaching and proctored exams revealed significant gaps in student learning during remote instruction. The challenge of supporting underperforming students is not expected to abate in the next 5-10 years as COVID-19-related learning losses compound structural inequalities in K-12 education. More recently, anecdotal evidence across courses shows declines in classroom attendance and student engagement. Lack of engagement indicates emotional barriers rather than intellectual deficiencies, and its growth coincides with the ongoing mental health epidemic. Regardless of the underlying reasons, professors are now faced with the unappealing choice of awarding failing grades to an uncomfortably large fraction of classes or awarding passing grades to students who do not seem prepared for the workforce or adult life in general. Faculty training, if it exists, addresses neither the scale of this situation nor the emotional/identity aspects of the problem. There is an urgent need for pedagogical remediation tools that can be applied without additional TA or staff resources, without training in psychiatry, and with only five or eight weeks remaining in the semester. This work presents two work-in-progress interventions for engineering faculty who face the challenges described above. In the first intervention, students can improve their exam score by submitting videos of reworked exams. The requirement of voiceover forces students to understand the thought process behind problems, even if they have copied the answers from a friend. Incorporating peer review into the assignment reduces the workload for instructor grading. This intervention has been successfully implemented in sophomore- and senior-level courses with positive feedback from both faculty and students. In the second intervention, students who fail the midterm are offered an automatic passing exam grade (typically 51%) in exchange for submitting a knowledge inventory and remediation plan. Students create a glossary of terms and concepts from the class and rank them by their level of understanding. Recent iterations of the remediation plan also include reflections on emotions and support networks. In February 2023, the project team will scale the interventions to freshman-level Introductory Programming, which has 400 students and the highest fail/withdrawal rate in the college. The large sample size will enable more robust statistics to correlate exam scores, intervention rubric items, and surveys on assignment effectiveness. Piloting interventions in a variety of environments and classes will establish best pedagogical practices that minimize instructors’ workload and decision fatigue. The ultimate goal of this project is to benefit students and faculty through well-defined and systematic interventions across the curriculum. 
    more » « less
  3. A prominent issue faced by the education research community is that of student attrition. While large research efforts have been devoted to studying course-level attrition, widely referred to as dropout, less research has been focused on finer-grained assignment level attrition commonly observed in K-12 classrooms. This later instantiation of attrition, referred to in this paper as “stopout,” is characterized by students failing to complete their assigned work, but the cause of such behavior are not often known. This becomes a large problem for educators and developers of learning platforms as students who give up on assignments early are missing opportunities to learn and practice the material which may affect future performance on related topics; similarly, it is difficult for researchers to develop, and subsequently difficult for computer-based systems to deploy interventions aimed at promoting productive persistence once a student has ceased interaction with the software. This difficulty highlights the importance to understand and identify early signs of stopout behavior in order to provide aid to students preemptively to promote productive persistence in their learning. While many cases of student stopout may be attributable to gaps in student knowledge and indicative of struggle, student attributes such as grit and persistence may be further affected by other factors. This work focuses on identifying different forms of stopout behavior in the context of middle school math by observing student behaviors at the sub-problem level. We find that students exhibit disproportionate stopout on the first problem of their assignments in comparison to stopout on subsequent problems, identifying a behavior that we call “refusal,” and use the emerging patterns of student activity to better understand the potential causes underlying stopout behavior early in an assignment. 
    more » « less
  4. Early prediction of student difficulty during long-duration learning activities allows a tutoring system to intervene by providing needed support, such as a hint, or by alerting an instructor. To be e effective, these predictions must come early and be highly accurate, but such predictions are difficult for open-ended programming problems. In this work, Recent Temporal Patterns (RTPs) are used in conjunction with Support Vector Machine and Logistic Regression to build robust yet interpretable models for early predictions. We performed two tasks: to predict student success and difficulty during one, open-ended novice programming task of drawing a square-shaped spiral. We compared RTP against several machine learning models ranging from the classic to the more recent deep learning models such as Long Short Term Memory to predict whether students would be able to complete the programming task. Our results show that RTP-based models outperformed all others, and could successfully classify students after just one minute of a 20- minute exercise (students can spend more than 1 hour on it). To determine when a system might intervene to prevent incompleteness or eventual dropout, we applied RTP at regular intervals to predict whether a student would make progress within the next fi ve minutes, reflecting that they may be having difficulty. RTP successfully classifi ed these students needing interventions over 85% of the time, with increased accuracy using data-driven program features. These results contribute signi ficantly to the potential to build a fully data-driven tutoring system for novice programming. 
    more » « less
  5. Prediction of student performance in Introductory programming courses can assist struggling students and improve their persistence. On the other hand, it is important for the prediction to be transparent for the instructor and students to effectively utilize the results of this prediction. Explainable Machine Learning models can effectively help students and instructors gain insights into students’ different programming behaviors and problem-solving strategies that can lead to good or poor performance. This study develops an explainable model that predicts students’ performance based on programming assignment submission information. We extract different data-driven features from students’ programming submissions and employ a stacked ensemble model to predict students’ final exam grades. We use SHAP, a game-theory-based framework, to explain the model’s predictions to help the stakeholders understand the impact of different programming behaviors on students’ success. Moreover, we analyze the impact of important features and utilize a combination of descriptive statistics and mixture models to identify different profiles of students based on their problem-solving patterns to bolster explainability. The experimental results suggest that our model significantly outperforms other Machine Learning models, including KNN, SVM, XGBoost, Bagging, Boosting, and Linear regression. Our explainable and transparent model can help explain students’ common problem-solving patterns in relationship with their level of expertise resulting in effective intervention and adaptive support to students. 
    more » « less