skip to main content

This content will become publicly available on March 25, 2025

Title: Online Reinforcement Learning-Based Pedagogical Planning for Narrative-Centered Learning Environments

Pedagogical planners can provide adaptive support to students in narrative-centered learning environments by dynamically scaffolding student learning and tailoring problem scenarios. Reinforcement learning (RL) is frequently used for pedagogical planning in narrative-centered learning environments. However, RL-based pedagogical planning raises significant challenges due to the scarcity of data for training RL policies. Most prior work has relied on limited-size datasets and offline RL techniques for policy learning. Unfortunately, offline RL techniques do not support on-demand exploration and evaluation, which can adversely impact the quality of induced policies. To address the limitation of data scarcity and offline RL, we propose INSIGHT, an online RL framework for training data-driven pedagogical policies that optimize student learning in narrative-centered learning environments. The INSIGHT framework consists of three components: a narrative-centered learning environment simulator, a simulated student agent, and an RL-based pedagogical planner agent, which uses a reward metric that is associated with effective student learning processes. The framework enables the generation of synthetic data for on-demand exploration and evaluation of RL-based pedagogical planning. We have implemented INSIGHT with OpenAI Gym for a narrative-centered learning environment testbed with rule-based simulated student agents and a deep Q-learning-based pedagogical planner. Our results show that online deep RL algorithms can induce near-optimal pedagogical policies in the INSIGHT framework, while offline deep RL algorithms only find suboptimal policies even with large amounts of data.

more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
Date Published:
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
Page Range / eLocation ID:
23191 to 23199
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Recent successes of Reinforcement Learning (RL) allow an agent to learn policies that surpass human experts but suffers from being time-hungry and data-hungry. By contrast, human learning is significantly faster because prior and general knowledge and multiple information resources are utilized. In this paper, we propose a Planner-Actor-Critic architecture for huMAN-centered planning and learning (PACMAN), where an agent uses its prior, high-level, deterministic symbolic knowledge to plan for goal-directed actions, and also integrates the Actor-Critic algorithm of RL to fine-tune its behavior towards both environmental rewards and human feedback. This work is the first unified framework where knowledge-based planning, RL, and human teaching jointly contribute to the policy learning of an agent. Our experiments demonstrate that PACMAN leads to a significant jump-start at the early stage of learning, converges rapidly and with small variance, and is robust to inconsistent, infrequent, and misleading feedback. 
    more » « less
  2. Reinforcement learning (RL) in low-data and risk-sensitive domains requires performant and flexible deployment policies that can readily incorporate constraints during deployment. One such class of policies are the semi-parametric H-step lookahead policies, which select actions using trajectory optimization over a dynamics model for a fixed horizon with a terminal value function. In this work, we investigate a novel instantiation of H-step lookahead with a learned model and a terminal value function learned by a model-free off-policy algorithm, named Learning Off-Policy with Online Planning (LOOP). We provide a theoretical analysis of this method, suggesting a tradeoff between model errors and value function errors and empirically demonstrate this tradeoff to be beneficial in deep reinforcement learning. Furthermore, we identify the "Actor Divergence" issue in this framework and propose Actor Regularized Control (ARC), a modified trajectory optimization procedure. We evaluate our method on a set of robotic tasks for Offline and Online RL and demonstrate improved performance. We also show the flexibility of LOOP to incorporate safety constraints during deployment with a set of navigation environments. We demonstrate that LOOP is a desirable framework for robotics applications based on its strong performance in various important RL settings. 
    more » « less
  3. Previous studies have convincingly shown that traditional, content-centered, and didactic teaching methods are not effective for developing a deep understanding and knowledge transfer. Nor does it adequately address the development of critical problem-solving skills. Active and collaborative instruction, coupled with effective means to encourage student engagement, invariably leads to better student learning outcomes irrespective of academic discipline. Despite these findings, the existing construction engineering programs, for the most part, consist of a series of fragmented courses that mainly focus on procedural skills rather than on the fundamental and conceptual knowledge that helps students become innovative problem-solvers. In addition, these courses are heavily dependent on traditional lecture-based teaching methods focused on well-structured and closed-ended problems that prepare students to plug variables into equations to get the answer. Existing programs rarely offer a systematic approach to allow students to develop a deep understanding of the engineering core concepts and discover systematic solutions for fundamental problems. Without properly understanding these core concepts, contextualized in domain-specific settings, students are not able to develop a holistic view that will help them to recognize the big picture and think outside the box to come up with creative solutions for arising problems. The long history of empirical learning in the field of construction engineering shows the significant potential of cognitive development through direct experience and reflection on what works in particular situations. Of course, the complex nature of the construction industry in the twenty-first century cannot afford an education through trial and error in the real environment. However, recent advances in computer science can help educators develop virtual environments and gamification platforms that allow students to explore various scenarios and learn from their experiences. This study aims to address this need by assessing the effectiveness of guided active exploration in a digital game environment on students’ ability to discover systematic solutions for fundamental problems in construction engineering. To address this objective, through a research project funded by the NSF Division of Engineering Education and Centers (EEC), we designed and developed a scenario-based interactive digital game, called Zebel, to guide students solve fundamental problems in construction scheduling. The proposed gamified pedagogical approach was designed based on the Constructivism learning theory and a framework that consists of six essential elements: (1) modeling; (2) reflection; (3) strategy formation; (4) scaffolded exploration; (5) debriefing; and (6) articulation. We also designed a series of pre- and post-assessment instruments for empirical data collection to assess the effectiveness of the proposed approach. The proposed gamified method was implemented in a graduate-level construction planning and scheduling course. The outcomes indicated that students with no prior knowledge of construction scheduling methods were able to discover systematic solutions for fundamental scheduling problems through their experience with the proposed gamified learning method. 
    more » « less
  4. We consider the problem of time-limited robotic exploration in previously unseen environments where exploration is limited by a predefined amount of time. We propose a novel exploration approach using learning-augmented model-based planning. We generate a set of sub goals associated with frontiers on the current map and derive a Bellman Equation for exploration with these subgoals. Visual sensing and advances in semantic mapping of indoor scenes are exploited for training a deep convolutional neural network to estimate properties associated with each frontier: the expected unobserved area beyond the frontier and the expected time steps (discretized actions) required to explore it. The proposed model-based planner is guaranteed to explore the whole scene if time permits. We thoroughly evaluate our approach on a large-scale pseudo-realistic indoor dataset (Matterport3D) with the Habitat simulator. We compare our approach with classical and more recent RL-based exploration methods. Our approach surpasses the greedy strategies by 2.1% and the RL-based exploration methods by 8.4% in terms of coverage. 
    more » « less
  5. This work provides a framework for a workspace aware online grasp planner. This framework greatly improves the performance of standard online grasp planning algorithms by incorporating a notion of reachability into the online grasp planning process. Offline, a database of hundreds of thousands of unique end-effector poses were queried for feasibility. At runtime, our grasp planner uses this database to bias the hand towards reachable end-effector configurations. The bias keeps the grasp planner in accessible regions of the planning scene so that the resulting grasps are tailored to the situation at hand. This results in a higher percentage of reachable grasps, a higher percentage of successful grasp executions, and a reduced planning time. We also present experimental results using simulated and real environments. 
    more » « less