skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on April 12, 2026

Title: Experiential Explanations for Reinforcement Learning
Abstract Reinforcement learning (RL) systems can be complex and non-interpretable, making it challenging for non-AI experts to understand or intervene in their decisions. This is due in part to the sequential nature of RL in which actions are chosen because of their likelihood of obtaining future rewards. However, RL agents discard the qualitative features of their training, making it difficult to recover user-understandable information for “why” an action is chosen. We propose a techniqueExperiential Explanationsto generate counterfactual explanations by traininginfluence predictorsalong with the RL policy. Influence predictors are models that learn how different sources of reward affect the agent in different states, thus restoring information about how the policy reflects the environment. Two human evaluation studies revealed that participants presented with Experiential Explanations were better able to correctly guess what an agent would do than those presented with other standard types of explanation. Participants also found that Experiential Explanations are more understandable, satisfying, complete, useful, and accurate. Qualitative analysis provides information on the factors of Experiential Explanations that are most useful and the desired characteristics that participants seek from the explanations.  more » « less
Award ID(s):
1928586
PAR ID:
10613089
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
Springer
Date Published:
Journal Name:
Neural Computing and Applications
ISSN:
0941-0643
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    In recent years, Reinforcement learning (RL), especially Deep RL (DRL), has shown outstanding performance in video games from Atari, Mario, to StarCraft. However, little evidence has shown that DRL can be successfully applied to real-life human-centric tasks such as education or healthcare. Different from classic game-playing where the RL goal is to make an agent smart, in human-centric tasks the ultimate RL goal is to make the human-agent interactions productive and fruitful. Additionally, in many real-life human-centric tasks, data can be noisy and limited. As a sub-field of RL, batch RL is designed for handling situations where data is limited yet noisy, and building simulations is challenging. In two consecutive classroom studies, we investigated applying batch DRL to the task of pedagogical policy induction for an Intelligent Tutoring System (ITS), and empirically evaluated the effectiveness of induced pedagogical policies. In Fall 2018 (F18), the DRL policy is compared against an expert-designed baseline policy and in Spring 2019 (S19), we examined the impact of explaining the batch DRL-induced policy with student decisions and the expert baseline policy. Our results showed that 1) while no significant difference was found between the batch RL-induced policy and the expert policy in F18, the batch RL-induced policy with simple explanations significantly improved students’ learning performance more than the expert policy alone in S19; and 2) no significant differences were found between the student decision making and the expert policy. Overall, our results suggest that pairing simple explanations with induced RL policies can be an important and effective technique for applying RL to real-life human-centric tasks. 
    more » « less
  2. While Reinforcement learning (RL), especially Deep RL (DRL), has shown outstanding performance in video games, little evidence has shown that DRL can be successfully applied to human-centric tasks where the ultimate RL goal is to make the \textit{human-agent interactions} productive and fruitful. In real-life, complex, human-centric tasks, such as education and healthcare, data can be noisy and limited. Batch RL is designed for handling such situations where data is \textit{limited yet noisy}, and where \textit{building simulations is challenging}. In two consecutive empirical studies, we investigated Batch DRL for pedagogical policy induction, to choose student learning activities in an Intelligent Tutoring System. In Fall 2018 (F18), we compared the Batch DRL policy to an Expert policy, but found no significant difference between the DRL and Expert policies. In Spring 2019 (S19), we augmented the Batch DRL-induced policy with \textit{a simple act of explanation} by showing a message such as \textit{"The AI agent thinks you should view this problem as a Worked Example to learn how some new rules work."}. We compared this policy against two conditions, the Expert policy, and a student decision making policy. Our results show that 1) the Batch DRL policy with explanations significantly improved student learning performance more than the Expert policy; and 2) no significant differences were found between the Expert policy and student decision making. Overall, our results suggest that \textit{pairing simple explanations with the Batch DRL policy} can be an important and effective technique for applying RL to real-life, human-centric tasks. 
    more » « less
  3. We introduce RLang, a domain-specific language (DSL) for communicating domain knowledge to an RL agent. Unlike existing RL DSLs that ground to single elements of a decision-making formalism (e.g., the reward function or policy), RLang can specify information about every element of a Markov decision process. We define precise syntax and grounding semantics for RLang, and provide a parser that grounds RLang programs to an algorithm-agnostic partial world model and policy that can be exploited by an RL agent. We provide a series of example RLang programs demonstrating how different RL methods can exploit the resulting knowledge, encompassing model-free and model-based tabular algorithms, policy gradient and value-based methods, hierarchical approaches, and deep methods. 
    more » « less
  4. Nooripour, Roghieh (Ed.)
    School choice initiatives–which empower parents to choose which schools their children attend–are built on the assumptions that parents know what features of a school are most important to their family and that they are capable of focusing on the most important features when they make their decisions. However, decades of psychological research suggest that decision makers lack metacognitive knowledge of the factors that influence their decisions. We sought to reconcile this discrepancy between the policy assumptions and the psychological research. To do so, we asked participants to complete Choice-Based Conjoint surveys in which they made series of choices between different hypothetical schools. We then asked participants to self-report the weight they placed on each attribute when making their choices. Across four studies, we found that participants did not know how much weight they had placed on various school attributes. Average correlations between stated and revealed weights ranged fromr= .34–.54. Stated weights predicted different choices than revealed weights in 16.41–20.63% of decisions. These metacognitive limitations persisted regardless of whether the participants were parents or non-parents (Study 1a/1b), the nature of the attributes that participants used to evaluate alternatives (Study 2), and whether or not decision makers had access to school ratings that could be used as metacognitive aids (Study 3). In line with prior psychological research–and in contract to policy assumptions–these findings demonstrate that decision makers do not have particularly strong metacognitive knowledge of the factors that influence their school choice decisions. As a result, parents making school choice decisions are likely to seek out and use the wrong information, thus leading to suboptimal school choices. Future research should replicate these results in more ecologically valid samples and test new approaches to school choice that account for these metacognitive limitations. 
    more » « less
  5. Mitrovic, A.; & Bosch, N. (Ed.)
    Working collaboratively in groups can positively impact performance and student engagement. Intelligent social agents can provide a source of personalized support for students, and their benefits likely extend to collaborative settings, but it is difficult to determine how these agents should interact with students. Reinforcement learning (RL) offers an opportunity for adapting the interactions between the social agent and the students to better support collaboration and learning. However, using RL in education with social agents typically involves training using real students. In this work, we train an RL agent in a high-quality simulated environment to learn how to improve students’ collaboration. Data was collected during a pilot study with dyads of students who worked together to tutor an intelligent teachable robot. We explore the process of building an environment from the data, training a policy, and the impact of the policy on different students, compared to various baselines. 
    more » « less