skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Supporting End Users in Defining Reinforcement-Learning Problems for Human-Robot Interactions (Extended Abstract)
Reinforcement learning (RL) can help agents learn complex tasks that would be hard to specify using standard imperative programming. However, end users may have trouble personalizing their technology using RL due to a lack of technical expertise. Prior work has explored means of supporting end users after a problem for the RL agent to solve has been defined. Little work, however, has explored how to support end users when defining this problem. We propose a tool to provide structured support for end users defining problems for RL agents. Through this tool, users can (i) directly and indirectly specify the problem as a Markov decision process (MDP); (ii) receive automatic suggestions on possible MDP changes that would enhance training time and accuracy; and (iii) revise the MDP after training the agent to solve it. We believe this work will help reduce barriers to using RL and contribute to the existing literature on designing human-in-the-loop systems.  more » « less
Award ID(s):
1837120
PAR ID:
10387468
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
The 5th Multidisciplinary Conference on Reinforcement Learning and Decision Making (RLDM)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In this paper we explore what role humans might play in designing tools for reinforcement learning (RL) agents to interact with the world. Recent work has explored RL methods that optimize a robot’s morphology while learning to control it, effectively dividing an RL agent’s environment into the external world and the agent’s interface with the world. Taking a user-centered design (UCD) approach, we explore the potential of a human, instead of an algorithm, redesigning the agent’s tool. Using UCD to design for a machine learning agent brings up several research questions, including what it means to understand an RL agent’s experience, beliefs, tendencies, and goals. After discussing these questions, we then present a system we developed to study humans designing a 2D racecar for an RL autonomous driver. We conclude with findings and insights from exploratory pilots with twelve users using this system. 
    more » « less
  2. Mitrovic, A.; & Bosch, N. (Ed.)
    Working collaboratively in groups can positively impact performance and student engagement. Intelligent social agents can provide a source of personalized support for students, and their benefits likely extend to collaborative settings, but it is difficult to determine how these agents should interact with students. Reinforcement learning (RL) offers an opportunity for adapting the interactions between the social agent and the students to better support collaboration and learning. However, using RL in education with social agents typically involves training using real students. In this work, we train an RL agent in a high-quality simulated environment to learn how to improve students’ collaboration. Data was collected during a pilot study with dyads of students who worked together to tutor an intelligent teachable robot. We explore the process of building an environment from the data, training a policy, and the impact of the policy on different students, compared to various baselines. 
    more » « less
  3. Pedagogical planners can provide adaptive support to students in narrative-centered learning environments by dynamically scaffolding student learning and tailoring problem scenarios. Reinforcement learning (RL) is frequently used for pedagogical planning in narrative-centered learning environments. However, RL-based pedagogical planning raises significant challenges due to the scarcity of data for training RL policies. Most prior work has relied on limited-size datasets and offline RL techniques for policy learning. Unfortunately, offline RL techniques do not support on-demand exploration and evaluation, which can adversely impact the quality of induced policies. To address the limitation of data scarcity and offline RL, we propose INSIGHT, an online RL framework for training data-driven pedagogical policies that optimize student learning in narrative-centered learning environments. The INSIGHT framework consists of three components: a narrative-centered learning environment simulator, a simulated student agent, and an RL-based pedagogical planner agent, which uses a reward metric that is associated with effective student learning processes. The framework enables the generation of synthetic data for on-demand exploration and evaluation of RL-based pedagogical planning. We have implemented INSIGHT with OpenAI Gym for a narrative-centered learning environment testbed with rule-based simulated student agents and a deep Q-learning-based pedagogical planner. Our results show that online deep RL algorithms can induce near-optimal pedagogical policies in the INSIGHT framework, while offline deep RL algorithms only find suboptimal policies even with large amounts of data. 
    more » « less
  4. Reinforcement learning (RL) has been employed to devise the best course of actions in defending the critical infrastructures, such as power networks against cyberattacks. Nonetheless, even in the case of the smallest power grids, the action space of RL experiences exponential growth, rendering efficient exploration by the RL agent practically unattainable. The current RL algorithms tailored to power grids are generally not suited when the state-action space size becomes large, despite trade-offs. We address the large action-space problem for power grid security by exploiting temporal graph convolutional neural networks (TGCNs) to develop a parallel but heterogeneous RL framework. In particular, we divide the action space into smaller subspaces, each explored by an RL agent. How to efficiently organize the spatiotemporal action sequences then becomes a great challenge. We invoke TGCN to meet this challenge by accurately predicting the performance of each individual RL agent in the event of an attack. The top performing agent is selected, resulting in the optimal sequence of actions. First, we investigate the action-space size comparison for IEEE 5-bus and 14-bus systems. Furthermore, we use IEEE 14-bus and IEEE 118-bus systems coupled with the Grid2Op platform to illustrate the performance and action division influence on training times and grid survival rates using both deep Q-learning and Soft Actor Critic trained agents and Grid2Op default greedy agents. Our TGCN framework provides a computationally reasonable approach for generating the best course of actions to defend cyber physical systems against attacks. 
    more » « less
  5. Despite the potential of reinforcement learning (RL) for building general-purpose robotic systems, training RL agents to solve robotics tasks still remains challenging due to the difficulty of exploration in purely continuous action spaces. Addressing this problem is an active area of research with the majority of focus on improving RL methods via better optimization or more efficient exploration. An alternate but important component to consider improving is the interface of the RL algorithm with the robot. In this work, we manually specify a library of robot action primitives (RAPS), parameterized with arguments that are learned by an RL policy. These parameterized primitives are expressive, simple to implement, enable efficient exploration and can be transferred across robots, tasks and environments. We perform a thorough empirical study across challenging tasks in three distinct domains with image input and a sparse terminal reward. We find that our simple change to the action interface substantially improves both the learning efficiency and task performance irrespective of the underlying RL algorithm, significantly outperforming prior methods which learn skills from offline expert data. Code and videos at https://mihdalal.github.io/raps/ 
    more » « less