Identify Critical Pedagogical Decisions through Adversarial Deep Reinforcement Learning

Ju, S.

For many forms of e-learning environments, the system's behaviors can be viewed as a sequential decision process wherein, at each discrete step, the system is responsible for deciding the next system action when there are multiple ones available. Each of these system decisions aects the user's successive actions and performance and some of them are more important than others. Thus, this raises an open ques- tion: how can we identify the critical system interactive de- cisions that are linked to student learning from a long trajec- tory of decisions? In this work, we proposed and evaluated Critical-Reinforcement Learning (Critical-RL), an adversar- ial deep reinforcement learning (ADRL) based framework to identify critical decisions and induce compact yet eective policies. Speci cally, it induces a pair of adversarial policies based upon Deep Q-Network (DQN) with opposite goals: one is to improve student learning while the other is to hin- der; critical decisions are identi ed by comparing the two adversarial policies and using their corresponding Q-value dierences; nally, a Critical policy is induced by giving op- timal action on critical decisions but random yet reason- able decisions on others. We evaluated the eectiveness of Critical policy against a random yet reasonable (Random) policy. While no signi cant dierence was found between the two condition, it is probably because of small sample sizes. Much to our surprise, we found that students often experience so-called Critical phase: a consecutive sequence of critical decisions with the same action. Students were further divided into High vs. Low based on the number of Critical phases they experienced and our results showed that while no signi cant was found between the two Low groups, the High Critical group learned signi cantly more than the High Random group.

More Like this