BADDr: Bayes-Adaptive Deep Dropout RL for POMDPs

Katt, Sammie; Nguyen, Hai; Oliehoek, Frans; Amato, Christopher

Citation Details

While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but struggles to scale. To tackle this challenge, BRL frameworks with various prior assumptions have been proposed, with varied success. This work presents a representation-agnostic formulation of BRL under partial observability, unifying the previous models under one theoretical umbrella. To demonstrate its practical significance we also propose a novel derivation, Bayes-Adaptive Deep Dropout rl (BADDr), based on dropout networks. Under this parameterization, in contrast to previous work, the belief over the state and dynamics is a more scalable inference problem. We choose actions through Monte-Carlo tree search and empirically show that our method is competitive with state-of-the-art BRL methods on small domains while being able to solve much larger ones. more »

Award ID(s):: 2024790 1734497

PAR ID:: 10339109

Author(s) / Creator(s):: Katt, Sammie; Nguyen, Hai; Oliehoek, Frans; Amato, Christopher

Date Published:: 2022-01-01

Journal Name:: Proceedings of the International Conference on Autonomous Agents

ISSN:: 1534-4797

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this