Asymmetric DQN for Partially Observable Reinforcement Learning

Baisero, Andrea; Daley, Brett; Amato, Christopher

Citation Details

Offline training in simulated partially observable environments allows reinforcement learning methods to exploit privileged state information through a mechanism known as asymmetry. Such privileged information has the potential to greatly improve the optimal convergence properties, if used appropriately. However, current research in asymmetric reinforcement learning is often heuristic in nature, with few connections to underlying theory or theoretical guarantees, and is primarily tested through empirical evaluation. In this work, we develop the theory of \emph{asymmetric policy iteration}, an exact model-based dynamic programming solution method, and then apply relaxations which eventually result in \emph{asymmetric DQN}, a model-free deep reinforcement learning algorithm. Our theoretical findings are complemented and validated by empirical experimentation performed in environments which exhibit significant amounts of partial observability, and require both information gathering strategies and memorization. more »

Award ID(s):: 2024790 1816382

PAR ID:: 10429490

Author(s) / Creator(s):: Baisero, Andrea; Daley, Brett; Amato, Christopher

Date Published:: 2022-01-01

Journal Name:: Conference on Uncertainty in Artificial Intelligence

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this