skip to main content


Title: One minute is enough: Early Prediction of Student Success and Event-level Difficulty during Novice Programming Tasks
Early prediction of student difficulty during long-duration learning activities allows a tutoring system to intervene by providing needed support, such as a hint, or by alerting an instructor. To be e effective, these predictions must come early and be highly accurate, but such predictions are difficult for open-ended programming problems. In this work, Recent Temporal Patterns (RTPs) are used in conjunction with Support Vector Machine and Logistic Regression to build robust yet interpretable models for early predictions. We performed two tasks: to predict student success and difficulty during one, open-ended novice programming task of drawing a square-shaped spiral. We compared RTP against several machine learning models ranging from the classic to the more recent deep learning models such as Long Short Term Memory to predict whether students would be able to complete the programming task. Our results show that RTP-based models outperformed all others, and could successfully classify students after just one minute of a 20- minute exercise (students can spend more than 1 hour on it). To determine when a system might intervene to prevent incompleteness or eventual dropout, we applied RTP at regular intervals to predict whether a student would make progress within the next fi ve minutes, reflecting that they may be having difficulty. RTP successfully classifi ed these students needing interventions over 85% of the time, with increased accuracy using data-driven program features. These results contribute signi ficantly to the potential to build a fully data-driven tutoring system for novice programming.  more » « less
Award ID(s):
1651909
NSF-PAR ID:
10136495
Author(s) / Creator(s):
Date Published:
Journal Name:
In: Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019)
Page Range / eLocation ID:
119 – 128
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract: Modeling student learning processes is highly complex since it is influenced by many factors such as motivation and learning habits. The high volume of features and tools provided by computer-based learning environments confounds the task of tracking student knowledge even further. Deep Learning models such as Long-Short Term Memory (LSTMs) and classic Markovian models such as Bayesian Knowledge Tracing (BKT) have been successfully applied for student modeling. However, much of this prior work is designed to handle sequences of events with discrete timesteps, rather than considering the continuous aspect of time. Given that time elapsed between successive elements in a student’s trajectory can vary from seconds to days, we applied a Timeaware LSTM (T-LSTM) to model the dynamics of student knowledge state in continuous time. We investigate the effectiveness of T-LSTM on two domains with very different characteristics. One involves an open-ended programming environment where students can self-pace their progress and T-LSTM is compared against LSTM, Recent Temporal Pattern Mining, and the classic Logistic Regression (LR) on the early prediction of student success; the other involves a classic tutor-driven intelligent tutoring system where the tutor scaffolds the student learning step by step and T-LSTM is compared with LSTM, LR, and BKT on the early prediction of student learning gains. Our results show that TLSTM significantly outperforms the other methods on the self-paced, open-ended programming environment; while on the tutor-driven ITS, it ties with LSTM and outperforms both LR and BKT. In other words, while time-irregularity exists in both datasets, T-LSTM works significantly better than other student models when the pace is driven by students. On the other hand, when such irregularity results from the tutor, T-LSTM was not superior to other models but its performance was not hurt either. 
    more » « less
  2. null (Ed.)
    Determining when and whether to provide personalized support is a well-known challenge called the assistance dilemma. A core problem in solving the assistance dilemma is the need to discover when students are unproductive so that the tutor can intervene. Such a task is particularly challenging for open-ended domains, even those that are well-structured with defined principles and goals. We present a set of datadriven methods to classify, predict, and prevent unproductive problem-solving steps in the well-structured open-ended domain of logic. This approach leverages and extends the Hint Factory, a set of methods that leverages prior student solution attempts to build data-driven intelligent tutors. We present a HelpNeed classification that uses prior student data to determine when students are likely to be unproductive and need help learning optimal problem-solving strategies. We present a controlled study to determine the impact of an Adaptive pedagogical policy that provides proactive hints at the start of each step based on the outcomes of our HelpNeed predictor: productive vs. unproductive. Our results show that the students in the Adaptive condition exhibited better training behaviors, with lower help avoidance, and higher help appropriateness (a higher chance of receiving help when it was likely to be needed), as measured using the HelpNeed classifier, when compared to the Control. Furthermore, the results show that the students who received Adaptive hints based on HelpNeed predictions during training significantly outperform their Control peers on the posttest, with the former producing shorter, more optimal solutions in less time. We conclude with suggestions on how these HelpNeed methods could be applied in other well-structured open-ended domains. 
    more » « less
  3. Computer-aided design (CAD) programs are essential to engineering as they allow for better designs through low-cost iterations. While CAD programs are typically taught to undergraduate students as a job skill, such software can also help students learn engineering concepts. A current limitation of CAD programs (even those that are specifically designed for educational purposes) is that they are not capable of providing automated real-time help to students. To encourage CAD programs to build in assistance to students, we used data generated from students using a free, open-source CAD software called Aladdin to demonstrate how student data combined with machine learning techniques can predict how well a particular student will perform in a design task. We challenged students to design a house that consumed zero net energy as part of an introductory engineering technology undergraduate course. Using data from 128 students, along with the scikit-learn Python machine learning library, we tested our models using both total counts of design actions and sequences of design actions as inputs. We found that our models using early design sequence actions are particularly valuable for prediction. Our logistic regression model achieved a >60% chance of predicting if a student would succeed in designing a zero net energy house. Our results suggest that it would be feasible for Aladdin to provide useful feedback to students when they are approximately halfway through their design. Further improvements to these models could lead to earlier predictions and thus provide students feedback sooner to enhance their learning. 
    more » « less
  4. Background: Teachers often rely on the use of open‐ended questions to assess students' conceptual understanding of assigned content. Particularly in the context of mathematics; teachers use these types of questions to gain insight into the processes and strategies adopted by students in solving mathematical problems beyond what is possible through more close‐ended problem types. While these types of problems are valuable to teachers, the variation in student responses to these questions makes it difficult, and time‐consuming, to evaluate and provide directed feedback. It is a well‐studied concept that feedback, both in terms of a numeric score but more importantly in the form of teacher‐authored comments, can help guide students as to how to improve, leading to increased learning. It is for this reason that teachers need better support not only for assessing students' work but also in providing meaningful and directed feedback to students. Objectives: In this paper, we seek to develop, evaluate, and examine machine learning models that support automated open response assessment and feedback. Methods: We build upon the prior research in the automatic assessment of student responses to open‐ended problems and introduce a novel approach that leverages student log data combined with machine learning and natural language processing methods. Utilizing sentence‐level semantic representations of student responses to open‐ended questions, we propose a collaborative filtering‐based approach to both predict student scores as well as recommend appropriate feedback messages for teachers to send to their students. Results and Conclusion: We find that our method outperforms previously published benchmarks across three different metrics for the task of predicting student performance. Through an error analysis, we identify several areas where future works maybe able to improve upon our approach. 
    more » « less
  5. Abstract Background

    Teachers often rely on the use of open‐ended questions to assess students' conceptual understanding of assigned content. Particularly in the context of mathematics; teachers use these types of questions to gain insight into the processes and strategies adopted by students in solving mathematical problems beyond what is possible through more close‐ended problem types. While these types of problems are valuable to teachers, the variation in student responses to these questions makes it difficult, and time‐consuming, to evaluate and provide directed feedback. It is a well‐studied concept that feedback, both in terms of a numeric score but more importantly in the form of teacher‐authored comments, can help guide students as to how to improve, leading to increased learning. It is for this reason that teachers need better support not only for assessing students' work but also in providing meaningful and directed feedback to students.

    Objectives

    In this paper, we seek to develop, evaluate, and examine machine learning models that support automated open response assessment and feedback.

    Methods

    We build upon the prior research in the automatic assessment of student responses to open‐ended problems and introduce a novel approach that leverages student log data combined with machine learning and natural language processing methods. Utilizing sentence‐level semantic representations of student responses to open‐ended questions, we propose a collaborative filtering‐based approach to both predict student scores as well as recommend appropriate feedback messages for teachers to send to their students.

    Results and Conclusion

    We find that our method outperforms previously published benchmarks across three different metrics for the task of predicting student performance. Through an error analysis, we identify several areas where future works may be able to improve upon our approach.

     
    more » « less