skip to main content


Title: Building a reinforcement learning environment from limited data to optimize teachable robot interventions.
Working collaboratively in groups can positively impact performance and student engagement. Intelligent social agents can provide a source of personalized support for students, and their benefits likely extend to collaborative settings, but it is difficult to determine how these agents should interact with students. Reinforcement learning (RL) offers an opportunity for adapting the interactions between the social agent and the students to better support collaboration and learning. However, using RL in education with social agents typically involves training using real students. In this work, we train an RL agent in a high-quality simulated environment to learn how to improve students’ collaboration. Data was collected during a pilot study with dyads of students who worked together to tutor an intelligent teachable robot. We explore the process of building an environment from the data, training a policy, and the impact of the policy on different students, compared to various baselines.  more » « less
Award ID(s):
2024645
NSF-PAR ID:
10373794
Author(s) / Creator(s):
; ; ; ; ; ;
Editor(s):
Mitrovic, A.; & Bosch, N.
Date Published:
Journal Name:
Proceedings of the 15th International Conference on Educational Data Mining
Page Range / eLocation ID:
62-74
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Pedagogical planners can provide adaptive support to students in narrative-centered learning environments by dynamically scaffolding student learning and tailoring problem scenarios. Reinforcement learning (RL) is frequently used for pedagogical planning in narrative-centered learning environments. However, RL-based pedagogical planning raises significant challenges due to the scarcity of data for training RL policies. Most prior work has relied on limited-size datasets and offline RL techniques for policy learning. Unfortunately, offline RL techniques do not support on-demand exploration and evaluation, which can adversely impact the quality of induced policies. To address the limitation of data scarcity and offline RL, we propose INSIGHT, an online RL framework for training data-driven pedagogical policies that optimize student learning in narrative-centered learning environments. The INSIGHT framework consists of three components: a narrative-centered learning environment simulator, a simulated student agent, and an RL-based pedagogical planner agent, which uses a reward metric that is associated with effective student learning processes. The framework enables the generation of synthetic data for on-demand exploration and evaluation of RL-based pedagogical planning. We have implemented INSIGHT with OpenAI Gym for a narrative-centered learning environment testbed with rule-based simulated student agents and a deep Q-learning-based pedagogical planner. Our results show that online deep RL algorithms can induce near-optimal pedagogical policies in the INSIGHT framework, while offline deep RL algorithms only find suboptimal policies even with large amounts of data.

     
    more » « less
  2. null (Ed.)
    In recent years, Reinforcement learning (RL), especially Deep RL (DRL), has shown outstanding performance in video games from Atari, Mario, to StarCraft. However, little evidence has shown that DRL can be successfully applied to real-life human-centric tasks such as education or healthcare. Different from classic game-playing where the RL goal is to make an agent smart, in human-centric tasks the ultimate RL goal is to make the human-agent interactions productive and fruitful. Additionally, in many real-life human-centric tasks, data can be noisy and limited. As a sub-field of RL, batch RL is designed for handling situations where data is limited yet noisy, and building simulations is challenging. In two consecutive classroom studies, we investigated applying batch DRL to the task of pedagogical policy induction for an Intelligent Tutoring System (ITS), and empirically evaluated the effectiveness of induced pedagogical policies. In Fall 2018 (F18), the DRL policy is compared against an expert-designed baseline policy and in Spring 2019 (S19), we examined the impact of explaining the batch DRL-induced policy with student decisions and the expert baseline policy. Our results showed that 1) while no significant difference was found between the batch RL-induced policy and the expert policy in F18, the batch RL-induced policy with simple explanations significantly improved students’ learning performance more than the expert policy alone in S19; and 2) no significant differences were found between the student decision making and the expert policy. Overall, our results suggest that pairing simple explanations with induced RL policies can be an important and effective technique for applying RL to real-life human-centric tasks. 
    more » « less
  3. null (Ed.)
    Interactive reinforcement learning (IRL) agents use human feedback or instruction to help them learn in complex environments. Often, this feedback comes in the form of a discrete signal that’s either positive or negative. While informative, this information can be difficult to generalize on its own. In this work, we explore how natural language advice can be used to provide a richer feedback signal to a reinforcement learning agent by extending policy shaping, a well-known IRL technique. Usually policy shaping employs a human feedback policy to help an agent to learn more about how to achieve its goal. In our case, we replace this human feedback policy with policy generated based on natural language advice. We aim to inspect if the generated natural language reasoning provides support to a deep RL agent to decide its actions successfully in any given environment. So, we design our model with three networks: first one is the experience driven, next is the advice generator and third one is the advice driven. While the experience driven RL agent chooses its actions being influenced by the environmental reward, the advice driven neural network with generated feedback by the advice generator for any new state selects its actions to assist the RL agent to better policy shaping. 
    more » « less
  4. null (Ed.)
    Today’s classrooms are remarkably different from those of yesteryear. In place of individual students responding to the teacher from neat rows of desks, one more typically finds students working in groups on projects, with a teacher circulating among groups. AI applications in learning have been slow to catch up, with most available technologies focusing on personalizing or adapting instruction to learners as isolated individuals. Meanwhile, an established science of Computer Supported Collaborative Learning has come to prominence, with clear implications for how collaborative learning could best be supported. In this contribution, I will consider how intelligence augmentation could evolve to support collaborative learning as well as three signature challenges of this work that could drive AI forward. In conceptualizing collaborative learning, Kirschner and Erkens (2013) provide a useful 3x3 framework in which there are three aspects of learning (cognitive, social and motivational), three levels (community, group/team, and individual) and three kinds of pedagogical supports (discourse-oriented, representation-oriented, and process-oriented). As they engage in this multiply complex space, teachers and learners are both learning to collaborate and collaborating to learn. Further, questions of equity arise as we consider who is able to participate and in which ways. Overall, this analysis helps us see the complexity of today’s classrooms and within this complexity, the opportunities for augmentation or “assistance to become important and even essential. An overarching design concept has emerged in the past 5 years in response to this complexity, the idea of intelligent augmentation for “orchestrating” classrooms (Dillenbourg, et al, 2013). As a metaphor, orchestration can suggest the need for a coordinated performance among many agents who are each playing different roles or voicing different ideas. Practically speaking, orchestration suggests that “intelligence augmentation” could help many smaller things go well, and in doing so, could enable the overall intention of the learning experience to succeed. Those smaller things could include helping the teacher stay aware of students or groups who need attention, supporting formation of groups or transitions from one activity to the next, facilitating productive social interactions in groups, suggesting learning resources that would support teamwork, and more. A recent panel of AI experts identified orchestration as an overarching concept that is an important focus for near-term research and development for intelligence augmentation (Roschelle, Lester & Fusco, 2020). Tackling this challenging area of collaborative learning could also be beneficial for advancing AI technologies overall. Building AI agents that better understand the social context of human activities has broad importance, as does designing AI agents that can appropriately interact within teamwork. Collaborative learning has trajectory over time, and designing AI systems that support teams not just with a short term recommendation or suggestion but in long-term developmental processes is important. Further, classrooms that are engaged in collaborative learning could become very interesting hybrid environments, with multiple human and AI agents present at once and addressing dual outcome goals of learning to collaborate and collaborating to learn; addressing a hybrid environment like this could lead to developing AI systems that more robustly help many types of realistic human activity. In conclusion, the opportunity to make a societal impact by attending to collaborative learning, the availability of growing science of computer-supported collaborative learning and the need to push new boundaries in AI together suggest collaborative learning as a challenge worth tackling in coming years. 
    more » « less
  5. This paper describes how domain knowledge of power system operators can be integrated into reinforcement learning (RL) frameworks to effectively learn agents that control the grid's topology to prevent thermal cascading. Typical RL-based topology controllers fail to perform well due to the large search/optimization space. Here, we propose an actor-critic-based agent to address the problem's combinatorial nature and train the agent using the RL environment developed by RTE, the French TSO. To address the challenge of the large optimization space, a curriculum-based approach with reward tuning is incorporated into the training procedure by modifying the environment using network physics for enhanced agent learning. Further, a parallel training approach on multiple scenarios is employed to avoid biasing the agent to a few scenarios and make it robust to the natural variability in grid operations. Without these modifications to the training procedure, the RL agent failed for most test scenarios, illustrating the importance of properly integrating domain knowledge of physical systems for real-world RL learning. The agent was tested by RTE for the 2019 learning to run the power network challenge and was awarded the 2nd place in accuracy and 1st place in speed. The developed code is open-sourced for public use. Analysis of a simple system proves the enhancement in training RL-agents using the curriculum. 
    more » « less