skip to main content


Title: Identify Critical Pedagogical Decisions through Adversarial Deep Reinforcement Learning
For many forms of e-learning environments, the system's behaviors can be viewed as a sequential decision process wherein, at each discrete step, the system is responsible for deciding the next system action when there are multiple ones available. Each of these system decisions a ects the user's successive actions and performance and some of them are more important than others. Thus, this raises an open ques- tion: how can we identify the critical system interactive de- cisions that are linked to student learning from a long trajec- tory of decisions? In this work, we proposed and evaluated Critical-Reinforcement Learning (Critical-RL), an adversar- ial deep reinforcement learning (ADRL) based framework to identify critical decisions and induce compact yet e ective policies. Speci cally, it induces a pair of adversarial policies based upon Deep Q-Network (DQN) with opposite goals: one is to improve student learning while the other is to hin- der; critical decisions are identi ed by comparing the two adversarial policies and using their corresponding Q-value di erences; nally, a Critical policy is induced by giving op- timal action on critical decisions but random yet reason- able decisions on others. We evaluated the e ectiveness of Critical policy against a random yet reasonable (Random) policy. While no signi cant di erence was found between the two condition, it is probably because of small sample sizes. Much to our surprise, we found that students often experience so-called Critical phase: a consecutive sequence of critical decisions with the same action. Students were further divided into High vs. Low based on the number of Critical phases they experienced and our results showed that while no signi cant was found between the two Low groups, the High Critical group learned signi cantly more than the High Random group.  more » « less
Award ID(s):
1651909
NSF-PAR ID:
10136496
Author(s) / Creator(s):
Date Published:
Journal Name:
In: Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019)
Page Range / eLocation ID:
595 – 598
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In recent years, Reinforcement learning (RL), especially Deep RL (DRL), has shown outstanding performance in video games from Atari, Mario, to StarCraft. However, little evidence has shown that DRL can be successfully applied to real-life human-centric tasks such as education or healthcare. Different from classic game-playing where the RL goal is to make an agent smart, in human-centric tasks the ultimate RL goal is to make the human-agent interactions productive and fruitful. Additionally, in many real-life human-centric tasks, data can be noisy and limited. As a sub-field of RL, batch RL is designed for handling situations where data is limited yet noisy, and building simulations is challenging. In two consecutive classroom studies, we investigated applying batch DRL to the task of pedagogical policy induction for an Intelligent Tutoring System (ITS), and empirically evaluated the effectiveness of induced pedagogical policies. In Fall 2018 (F18), the DRL policy is compared against an expert-designed baseline policy and in Spring 2019 (S19), we examined the impact of explaining the batch DRL-induced policy with student decisions and the expert baseline policy. Our results showed that 1) while no significant difference was found between the batch RL-induced policy and the expert policy in F18, the batch RL-induced policy with simple explanations significantly improved students’ learning performance more than the expert policy alone in S19; and 2) no significant differences were found between the student decision making and the expert policy. Overall, our results suggest that pairing simple explanations with induced RL policies can be an important and effective technique for applying RL to real-life human-centric tasks. 
    more » « less
  2. Constrained action-based decision-making is one of the most challenging decision-making problems. It refers to a scenario where an agent takes action in an environment not only to maximize the expected cumulative reward but where it is subject to certain actionbased constraints; for example, an upper limit on the total number of certain actions being carried out. In this work, we construct a general data-driven framework called Constrained Action-based Partially Observable Markov Decision Process (CAPOMDP) to induce effective pedagogical policies. Specifically, we induce two types of policies: CAPOMDP-LG using learning gain as reward with the goal of improving students’ learning performance, and CAPOMDP-Time using time as reward for reducing students’ time on task. The effectiveness ofCAPOMDP-LG is compared against a random yet reasonable policy and the effectiveness of CAPOMDP-Time is compared against both a Deep Reinforcement Learning induced policy and a random policy. Empirical results show that there is an Aptitude Treatment Interaction effect: students are split into High vs. Low based on their incoming competence; while no significant difference is found among the High incoming competence groups, for the Low groups, students following CAPOMDP-Time indeed spent significantly less time than those using the two baseline policies and students following CAPOMDP-LG significantly outperform their peers on both learning gain and learning efficiency. 
    more » « less
  3. In interactive e-learning environments such as Intelligent Tutoring Systems, there are pedagogical decisions to make at two main levels of granularity: whole problems and single steps. Recent years have seen growing interest in data-driven techniques for such pedagogical decision making, which can dynamically tailor students’ learning experiences. Most existing data-driven approaches, however, treat these pedagogical decisions equally, or independently, disregarding the long-term impact that tutor decisions may have across these two levels of granularity. In this paper, we propose and apply an offline, off-policy Gaussian Processes based Hierarchical Reinforcement Learning (HRL) framework to induce a hierarchical pedagogical policy that makes decisions at both problem and step levels. In an empirical classroom study with 180 students, our results show that the HRL policy is significantly more effective than a Deep Q-Network (DQN) induced policy and a random yet reasonable baseline policy. 
    more » « less
  4. The development and measurable improvements in performance of large language models on natural language tasks [12] opens up the opportunity to utilize large language models in an educational setting to replicate human tutoring, which is often costly and inaccessible. We are particularly interested in large language models from the GPT series, created by OpenAI [7]. In a prior study we found that the quality of explanations generated with GPT-3.5 was poor, where two dierent approaches to generating explanations resulted in a 43% and 10% success rate. In this replication study, we were interested in whether the measurable improvements in GPT-4 performance [6] led to a higher rate of success for generating valid explanations compared to GPT-3.5. A replication of the original study was conducted by using GPT-4 to generate explanations for the same problems given to GPT-3.5. Using GPT-4, explanation correctness dramatically improved to a success rate of 94%.We were further interested in evaluating if GPT-4 explanations were positively perceived compared to human-written explanations. A preregistered, single-blinded study was implemented where 10 evaluators were asked to rate the quality of randomized GPT-4 and teacher-created explanations. Even with 4% of problems containing some amount of incorrect content, GPT-4 explanations were preferred over human explanations. The implications of our signi- cant results at Learning @ Scale are that digital platforms can start A/B testing the eects of GPT-4 generated explanations on student learning, implementing explanations at scale, and also prompt programming to test dierent education theories, e.g., social emotional learning factors [5]. 
    more » « less
  5. Early prediction of student difficulty during long-duration learning activities allows a tutoring system to intervene by providing needed support, such as a hint, or by alerting an instructor. To be e effective, these predictions must come early and be highly accurate, but such predictions are difficult for open-ended programming problems. In this work, Recent Temporal Patterns (RTPs) are used in conjunction with Support Vector Machine and Logistic Regression to build robust yet interpretable models for early predictions. We performed two tasks: to predict student success and difficulty during one, open-ended novice programming task of drawing a square-shaped spiral. We compared RTP against several machine learning models ranging from the classic to the more recent deep learning models such as Long Short Term Memory to predict whether students would be able to complete the programming task. Our results show that RTP-based models outperformed all others, and could successfully classify students after just one minute of a 20- minute exercise (students can spend more than 1 hour on it). To determine when a system might intervene to prevent incompleteness or eventual dropout, we applied RTP at regular intervals to predict whether a student would make progress within the next fi ve minutes, reflecting that they may be having difficulty. RTP successfully classifi ed these students needing interventions over 85% of the time, with increased accuracy using data-driven program features. These results contribute signi ficantly to the potential to build a fully data-driven tutoring system for novice programming. 
    more » « less