skip to main content


Title: Process-BERT: A Framework for Representation Learning on Educational Process Data
Educational process data, i.e., logs of detailed student activities in computerized or online learning platforms, has the potential to offer deep insights into how students learn. One can use process data for many downstream tasks such as learning outcome prediction and automatically delivering personalized intervention. In this paper, we propose a framework for learning representations of educational process data that is applicable across different learning scenarios. Our framework consists of a pre-training step that uses BERTtype objectives to learn representations from sequential process data and a fine-tuning step that further adjusts these representations on downstream prediction tasks. We apply our framework to the 2019 nation’s report card data mining competition dataset that consists of student problem-solving process data and detail the specific models we use in this scenario. We conduct both quantitative and qualitative experiments to show that our framework results in process data representations that are both predictive and informative.  more » « less
Award ID(s):
2225091 1840771
NSF-PAR ID:
10374330
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Educational Data Mining Conference
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Educational process data, i.e., logs of detailed student activities in computerized or online learning platforms, has the potential to offer deep insights into how students learn. One can use process data for many downstream tasks such as learning outcome prediction and automatically delivering personalized intervention. However, analyzing process data is challenging since the specific format of process data varies a lot depending on different learning/testing scenarios. In this paper, we propose a framework for learning representations of educational process data that is applicable across many different learning scenarios. Our framework consists of a pre-training step that uses BERT-type objectives to learn representations from sequential process data and a fine-tuning step that further adjusts these representations on downstream prediction tasks. We apply our framework to the 2019 nation’s report card data mining competition dataset that consists of student problem-solving process data and detail the specific models we use in this scenario. We conduct both quantitative and qualitative experiments to show that our framework results in process data representations that are both predictive and informative.1 
    more » « less
  2. Educational process data, i.e., logs of detailed student activities in computerized or online learning platforms, has the potential to offer deep insights into how students learn. One can use process data for many downstream tasks such as learning outcome prediction and automatically delivering personalized intervention. In this paper, we propose a framework for learning representations of educational process data that is applicable across different learning scenarios. Our framework consists of a pre-training step that uses BERTtype objectives to learn representations from sequential process data and a fine-tuning step that further adjusts these representations on downstream prediction tasks. We apply our framework to the 2019 nation’s report card data mining competition dataset that consists of student problem-solving process data and detail the specific models we use in this scenario. We conduct both quantitative and qualitative experiments to show that our framework results in process data representations that are both predictive and informative. 
    more » « less
  3. The commonsense natural language inference (CNLI) tasks aim to select the most likely follow-up statement to a contextual description of ordinary, everyday events and facts. Current approaches to transfer learning of CNLI models across tasks require many labeled data from the new task. This paper presents a way to reduce this need for additional annotated training data from the new task by leveraging symbolic knowledge bases, such as ConceptNet. We formulate a teacher-student framework for mixed symbolic-neural reasoning, with the large-scale symbolic knowledge base serving as the teacher and a trained CNLI model as the student. This hybrid distillation process involves two steps. The first step is a symbolic reasoning process. Given a collection of unlabeled data, we use an abductive reasoning framework based on Grenander's pattern theory to create weakly labeled data. Pattern theory is an energy-based graphical probabilistic framework for reasoning among random variables with varying dependency structures. In the second step, the weakly labeled data, along with a fraction of the labeled data, is used to transfer-learn the CNLI model into the new task. The goal is to reduce the fraction of labeled data required. We demonstrate the efficacy of our approach by using three publicly available datasets (OpenBookQA, SWAG, and HellaSWAG) and evaluating three CNLI models (BERT, LSTM, and ESIM) that represent different tasks. We show that, on average, we achieve 63% of the top performance of a fully supervised BERT model with no labeled data. With only 1000 labeled samples, we can improve this performance to 72%. Interestingly, without training, the teacher mechanism itself has significant inference power. The pattern theory framework achieves 32.7% accuracy on OpenBookQA, outperforming transformer-based models such as GPT (26.6%), GPT-2 (30.2%), and BERT (27.1%) by a significant margin. We demonstrate that the framework can be generalized to successfully train neural CNLI models using knowledge distillation under unsupervised and semi-supervised learning settings. Our results show that it outperforms all unsupervised and weakly supervised baselines and some early supervised approaches, while offering competitive performance with fully supervised baselines. Additionally, we show that the abductive learning framework can be adapted for other downstream tasks, such as unsupervised semantic textual similarity, unsupervised sentiment classification, and zero-shot text classification, without significant modification to the framework. Finally, user studies show that the generated interpretations enhance its explainability by providing key insights into its reasoning mechanism. 
    more » « less
  4. The development and application of deep learning method- ologies has grown within educational contexts in recent years. Perhaps attributable, in part, to the large amount of data that is made avail- able through the adoption of computer-based learning systems in class- rooms and larger-scale MOOC platforms, many educational researchers are leveraging a wide range of emerging deep learning approaches to study learning and student behavior in various capacities. Variations of recurrent neural networks, for example, have been used to not only pre- dict learning outcomes but also to study sequential and temporal trends in student data; it is commonly believed that they are able to learn high- dimensional representations of learning and behavioral constructs over time, such as the evolution of a students’ knowledge state while working through assigned content. Recent works, however, have started to dis- pute this belief, instead finding that it may be the model’s complexity that leads to improved performance in many prediction tasks and that these methods may not inherently learn these temporal representations through model training. In this work, we explore these claims further in the context of detectors of student affect as well as expanding on exist- ing work that explored benchmarks in knowledge tracing. Specifically, we observe how well trained models perform compared to deep learning networks where training is applied only to the output layer. While the highest results of prior works utilizing trained recurrent models are found to be superior, the application of our untrained-versions perform compa- rably well, outperforming even previous non-deep learning approaches. Keywords: Deep Learning · LSTM · Echo State Network · Affect · Knowledge Tracing. 
    more » « less
  5. The development and application of deep learning method- ologies has grown within educational contexts in recent years. Perhaps attributable, in part, to the large amount of data that is made avail- able through the adoption of computer-based learning systems in class- rooms and larger-scale MOOC platforms, many educational researchers are leveraging a wide range of emerging deep learning approaches to study learning and student behavior in various capacities. Variations of recurrent neural networks, for example, have been used to not only pre- dict learning outcomes but also to study sequential and temporal trends in student data; it is commonly believed that they are able to learn high- dimensional representations of learning and behavioral constructs over time, such as the evolution of a students' knowledge state while working through assigned content. Recent works, however, have started to dis- pute this belief, instead nding that it may be the model's complexity that leads to improved performance in many prediction tasks and that these methods may not inherently learn these temporal representations through model training. In this work, we explore these claims further in the context of detectors of student a ect as well as expanding on exist- ing work that explored benchmarks in knowledge tracing. Speci cally, we observe how well trained models perform compared to deep learning networks where training is applied only to the output layer. While the highest results of prior works utilizing trained recurrent models are found to be superior, the application of our untrained-versions perform compa- rably well, outperforming even previous non-deep learning approaches. 
    more » « less