Learning optimal policies in real-world domains with delayed rewards is a major challenge in Reinforcement Learning. We address the credit assignment problem by proposing a Gaussian Process (GP)-based immediate reward approximation algorithm and evaluate its effectiveness in 4 contexts where rewards can be delayed for long trajectories. In one GridWorld game and 8 Atari games, where immediate rewards are available, our results showed that on 7 out 9 games, the proposed GP inferred reward policy performed at least as well as the immediate reward policy and significantly outperformed the corresponding delayed reward policy. In e-learning and healthcare applications, we combined GP-inferred immediate rewards with offline Deep Q-Network (DQN) policy induction and showed that the GP-inferred reward policies outperformed the policies induced using delayed rewards in both real-world contexts.
more »
« less
Leveraging Deep Reinforcement Learning for Pedagogical Policy Induction in an Intelligent Tutoring System
Deep Reinforcement Learning (DRL) has been shown to be a very powerful technique in recent years on a wide range of applications. Much of the prior DRL work took the online learning approach. However, given the challenges of building accurate simulations for modeling student learning, we investigated applying DRL to induce a pedagogical policy through an offiine approach. In this work, we explored the effectiveness of offiine DRL for pedagogical policy induction in an Intelligent Tutoring System. Generally speaking, when applying offiine DRL, we face two major challenges: one is limited training data and the other is the credit assignment problem caused by delayed rewards. In this work, we used Gaussian Processes to solve the credit assignment problem by estimating the inferred immediate rewards from the final delayed rewards. We then applied the DQN and Double-DQN algorithms to induce adaptive pedagogical strategies tailored to individual students. Our empirical results show that without solving the credit assignment problem, the DQN policy, although better than Double-DQN, was no better than a random policy. However, when combining DQN with the inferred rewards, our best DQN policy can outperform the random yet reasonable policy, especially for students with high pre-test scores.
more »
« less
- Award ID(s):
- 1651909
- PAR ID:
- 10136494
- Date Published:
- Journal Name:
- In: Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019),
- Page Range / eLocation ID:
- 168 – 177
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Intelligent Tutoring Systems (ITSs) leverage AI to adapt to individual students, and many ITSs employ \emph{pedagogical policies} to decide what instructional action to take next in the face of alternatives. A number of researchers applied Reinforcement Learning (RL) and Deep RL (DRL) to induce effective pedagogical policies. Much of prior work, however, has been developed \emph{independently} for a specific ITS and \emph{cannot directly be applied to another}. In this work, we propose a \textbf{M}ulti-\textbf{T}ask \textbf{L}earning framework that combines Deep \textbf{BI}simulation \textbf{M}etrics and DRL, named \textbf{MTL-BIM}, to induce a unified pedagogical policies for two different ITSs across different domains: logic and probability. Based on empirical classroom results, our unified RL policy performed significantly better than the expert-crafted policies and independently induced DQN policies on both ITSs.more » « less
-
null (Ed.)Abstract: In interactive e-learning environments such as Intelligent Tutoring Systems, there are pedagogical decisions to make at two main levels of granularity: whole problems and single steps. In recent years, there is growing interest in applying datadriven techniques for adaptive decision making that can dynamically tailor students’ learning experiences. Most existing data-driven approaches, however, treat these pedagogical decisions equally, or independently, disregarding the long-term impact that tutor decisions may have across these two levels of granularity. In this paper, we propose and apply an offline Gaussian Processes based Hierarchical Reinforcement Learning (HRL) framework to induce a hierarchical pedagogical policy that makes decisions at both problem and step levels. An empirical classroom study shows that the HRL policy is significantly more effective than a Deep QNetwork (DQN) induced policy and a random yet reasonable baseline policy.more » « less
-
In interactive e-learning environments such as Intelligent Tutoring Systems, there are pedagogical decisions to make at two main levels of granularity: whole problems and single steps. Recent years have seen growing interest in data-driven techniques for such pedagogical decision making, which can dynamically tailor students’ learning experiences. Most existing data-driven approaches, however, treat these pedagogical decisions equally, or independently, disregarding the long-term impact that tutor decisions may have across these two levels of granularity. In this paper, we propose and apply an offline, off-policy Gaussian Processes based Hierarchical Reinforcement Learning (HRL) framework to induce a hierarchical pedagogical policy that makes decisions at both problem and step levels. In an empirical classroom study with 180 students, our results show that the HRL policy is significantly more effective than a Deep Q-Network (DQN) induced policy and a random yet reasonable baseline policy.more » « less
-
Abstract.In interactive e-learning environments such as Intelligent Tutor-ing Systems, there are pedagogical decisions to make at two main levels of granularity: whole problems and single steps. Recent years have seen grow-ing interest in data-driven techniques for such pedagogical decision making, which can dynamically tailor students’ learning experiences. Most existing data-driven approaches, however, treat these pedagogical decisions equally, or independently, disregarding the long-term impact that tutor decisions may have across these two levels of granularity. In this paper, we propose and apply an offline, off-policy Gaussian Processes based Hierarchical ReinforcementLearning (HRL) framework to induce a hierarchical pedagogical policy that makes decisions at both problem and step levels. In an empirical classroom study with 180 students, our results show that the HRL policy is significantly more effective than a Deep Q-Network (DQN) induced policy and a random yet reasonable baseline policy.more » « less
An official website of the United States government

