Continual Optimistic Initialization for Value-Based Reinforcement Learning

Dey, Sheelabhadra; Ault, James; Sharon, Guni

Citation Details

Comprehensive state-action exploration is essential for reinforcement learning (RL) algorithms. It enables them to find optimal solutions and avoid premature convergence. In value-based RL, optimistic initialization of the value function ensures sufficient exploration for finding the optimal solution. Optimistic values lead to curiosity-driven exploration enabling visitation of under-explored regions. However, optimistic initialization has limitations in stochastic and non-stationary environments due to its inability to explore ''infinitely-often''. To address this limitation, we propose a novel exploration strategy for value-based RL, denoted COIN, based on recurring optimistic initialization. By injecting a continual exploration bonus, we overcome the shortcoming of optimistic initialization (sensitivity to environment noise). We provide a rigorous theoretical comparison of COIN versus existing popular exploration strategies and prove it provides a unique set of attributes (coverage, infinite-often, no visitation tracking, and curiosity). We demonstrate the superiority of COIN over popular existing strategies on a designed toy domain as well as present results on common benchmark tasks. We observe that COIN outperforms existing exploration strategies in four out of six benchmark tasks while performing on par with the best baseline on the other two tasks. more »

Award ID(s):: 2238979

PAR ID:: 10577190

Author(s) / Creator(s):: Dey, Sheelabhadra; Ault, James; Sharon, Guni

Publisher / Repository:: Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems

Date Published:: 2024-05-06

ISBN:: 9798400704864

Page Range / eLocation ID:: 453–462

Subject(s) / Keyword(s):: exploration strategies optimistic initialization reinforcement learning

Format(s):: Medium: X

Location:: Auckland, New Zealand

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Proceeding:
The DOI is not currently available.

More Like this