In recent years, Reinforcement learning (RL), especially Deep RL (DRL), has shown outstanding performance in video games from Atari, Mario, to StarCraft. However, little evidence has shown that DRL can be successfully applied to real-life human-centric tasks such as education or healthcare. Different from classic game-playing where the RL goal is to make an agent smart, in human-centric tasks the ultimate RL goal is to make the human-agent interactions productive and fruitful. Additionally, in many real-life human-centric tasks, data can be noisy and limited. As a sub-field of RL, batch RL is designed for handling situations where data is limited yet noisy, and building simulations is challenging. In two consecutive classroom studies, we investigated applying batch DRL to the task of pedagogical policy induction for an Intelligent Tutoring System (ITS), and empirically evaluated the effectiveness of induced pedagogical policies. In Fall 2018 (F18), the DRL policy is compared against an expert-designed baseline policy and in Spring 2019 (S19), we examined the impact of explaining the batch DRL-induced policy with student decisions and the expert baseline policy. Our results showed that 1) while no significant difference was found between the batch RL-induced policy and the expert policy in F18, the batchmore »
Identify Critical Pedagogical Decisions through Adversarial Deep Reinforcement Learning
For many forms of e-learning environments, the system's
behaviors can be viewed as a sequential decision process
wherein, at each discrete step, the system is responsible for
deciding the next system action when there are multiple ones
available. Each of these system decisions aects the user's
successive actions and performance and some of them are
more important than others. Thus, this raises an open ques-
tion: how can we identify the critical system interactive de-
cisions that are linked to student learning from a long trajec-
tory of decisions? In this work, we proposed and evaluated
Critical-Reinforcement Learning (Critical-RL), an adversar-
ial deep reinforcement learning (ADRL) based framework to
identify critical decisions and induce compact yet eective
policies. Specically, it induces a pair of adversarial policies
based upon Deep Q-Network (DQN) with opposite goals:
one is to improve student learning while the other is to hin-
der; critical decisions are identied by comparing the two
adversarial policies and using their corresponding Q-value
dierences; nally, a Critical policy is induced by giving op-
timal action on critical decisions but random yet reason-
able decisions on others. We evaluated the eectiveness of
Critical policy against a random yet reasonable (Random)
policy. While no signicant dierence was found between
the two condition, it is probably because of small sample
sizes. Much to our surprise, we found that more »
- Award ID(s):
- 1651909
- Publication Date:
- NSF-PAR ID:
- 10136496
- Journal Name:
- In: Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019)
- Page Range or eLocation-ID:
- 595 – 598
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Constrained action-based decision-making is one of the most challenging decision-making problems. It refers to a scenario where an agent takes action in an environment not only to maximize the expected cumulative reward but where it is subject to certain actionbased constraints; for example, an upper limit on the total number of certain actions being carried out. In this work, we construct a general data-driven framework called Constrained Action-based Partially Observable Markov Decision Process (CAPOMDP) to induce effective pedagogical policies. Specifically, we induce two types of policies: CAPOMDP-LG using learning gain as reward with the goal of improving students’ learning performance, and CAPOMDP-Time using time as reward for reducing students’ time on task. The effectiveness ofCAPOMDP-LG is compared against a random yet reasonable policy and the effectiveness of CAPOMDP-Time is compared against both a Deep Reinforcement Learning induced policy and a random policy. Empirical results show that there is an Aptitude Treatment Interaction effect: students are split into High vs. Low based on their incoming competence; while no significant difference is found among the High incoming competence groups, for the Low groups, students following CAPOMDP-Time indeed spent significantly less time than those using the two baseline policies and students following CAPOMDP-LGmore »
-
In interactive e-learning environments such as Intelligent Tutoring Systems, there are pedagogical decisions to make at two main levels of granularity: whole problems and single steps. Recent years have seen growing interest in data-driven techniques for such pedagogical decision making, which can dynamically tailor students’ learning experiences. Most existing data-driven approaches, however, treat these pedagogical decisions equally, or independently, disregarding the long-term impact that tutor decisions may have across these two levels of granularity. In this paper, we propose and apply an offline, off-policy Gaussian Processes based Hierarchical Reinforcement Learning (HRL) framework to induce a hierarchical pedagogical policy that makes decisions at both problem and step levels. In an empirical classroom study with 180 students, our results show that the HRL policy is significantly more effective than a Deep Q-Network (DQN) induced policy and a random yet reasonable baseline policy.
-
Early prediction of student difficulty during long-duration learning activities allows a tutoring system to intervene by providing needed support, such as a hint, or by alerting an instructor. To be e effective, these predictions must come early and be highly accurate, but such predictions are difficult for open-ended programming problems. In this work, Recent Temporal Patterns (RTPs) are used in conjunction with Support Vector Machine and Logistic Regression to build robust yet interpretable models for early predictions. We performed two tasks: to predict student success and difficulty during one, open-ended novice programming task of drawing a square-shaped spiral. We compared RTP against several machine learning models ranging from the classic to the more recent deep learning models such as Long Short Term Memory to predict whether students would be able to complete the programming task. Our results show that RTP-based models outperformed all others, and could successfully classify students after just one minute of a 20- minute exercise (students can spend more than 1 hour on it). To determine when a system might intervene to prevent incompleteness or eventual dropout, we applied RTP at regular intervals to predict whether a student would make progress within the next fi ve minutes,more »
-
The development and measurable improvements in performance of large language models on natural language tasks [12] opens up the opportunity to utilize large language models in an educational setting to replicate human tutoring, which is often costly and inaccessible. We are particularly interested in large language models from the GPT series, created by OpenAI [7]. In a prior study we found that the quality of explanations generated with GPT-3.5 was poor, where two dierent approaches to generating explanations resulted in a 43% and 10% success rate. In this replication study, we were interested in whether the measurable improvements in GPT-4 performance [6] led to a higher rate of success for generating valid explanations compared to GPT-3.5. A replication of the original study was conducted by using GPT-4 to generate explanations for the same problems given to GPT-3.5. Using GPT-4, explanation correctness dramatically improved to a success rate of 94%.We were further interested in evaluating if GPT-4 explanations were positively perceived compared to human-written explanations. A preregistered, single-blinded study was implemented where 10 evaluators were asked to rate the quality of randomized GPT-4 and teacher-created explanations. Even with 4% of problems containing some amount of incorrect content, GPT-4 explanations were preferredmore »