Achieving efficient interpretability of reinforcement learning via policy distillation and selective input gradient regularization

Xing, Jinwei; Nagata, Takashi; Zou, Xinyun; Neftci, Emre; Krichmar, Jeffrey L.

doi:10.1016/j.neunet.2023.01.025

Citation Details

Achieving efficient interpretability of reinforcement learning via policy distillation and selective input gradient regularization

Although deep Reinforcement Learning (RL) has proven successful in a wide range of tasks, one challenge it faces is interpretability when applied to real-world problems. Saliency maps are frequently used to provide interpretability for deep neural networks. However, in the RL domain, existing saliency map approaches are either computationally expensive and thus cannot satisfy the real-time requirement of real-world scenarios or cannot produce interpretable saliency maps for RL policies. In this work, we propose an approach of Distillation with selective Input Gradient Regularization (DIGR) which uses policy distillation and input gradient regularization to produce new policies that achieve both high interpretability and computation efficiency in generating saliency maps. Our approach is also found to improve the robustness of RL policies to multiple adversarial attacks. We conduct experiments on three tasks, MiniGrid (Fetch Object), Atari (Breakout) and CARLA Autonomous Driving, to demonstrate the importance and effectiveness of our approach. more »

Award ID(s):: 1813785

PAR ID:: 10479648

Author(s) / Creator(s):: Xing, Jinwei; Nagata, Takashi; Zou, Xinyun; Neftci, Emre; Krichmar, Jeffrey L.

Publisher / Repository:: Elsevier

Date Published:: 2023-04-01

Journal Name:: Neural Networks

Volume:: 161

Issue:: C

ISSN:: 0893-6080

Page Range / eLocation ID:: 228 to 241

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1016/j.neunet.2023.01.025

More Like this