Effectively Learning Initiation Sets in Hierarchical Reinforcement Learning

Bagaria, Akhil; Abbatematteo, Ben; Gottesman, Omer; Corsaro, Matt; Rammohan, Sreehari; Konidaris, George

Citation Details

An agent learning an option in hierarchical reinforcement learning must solve three problems: identify the option’s subgoal (termination condition), learn a policy, and learn where that policy will succeed (initiation set). The termination condition is typically identified first, but the option policy and initiation set must be learned simultaneously, which is challenging because the initiation set depends on the option policy, which changes as the agent learns. Consequently, data obtained from option execution becomes invalid over time, leading to an inaccurate initiation set that subsequently harms downstream task performance. We highlight three issues—data non-stationarity, temporal credit assignment, and pessimism—specific to learning initiation sets, and propose to address them using tools from off-policy value estimation and classification. We show that our method learns higher-quality initiation sets faster than existing methods (in MINIGRID and MONTEZUMA’S REVENGE), can automatically discover promising grasps for robot manipulation (in ROBOSUITE), and improves the performance of a state-of-the-art option discovery method in a challenging maze navigation task in MuJoCo. more »

Award ID(s):: 1844960 1955361

PAR ID:: 10497999

Author(s) / Creator(s):: Bagaria, Akhil; Abbatematteo, Ben; Gottesman, Omer; Corsaro, Matt; Rammohan, Sreehari; Konidaris, George

Publisher / Repository:: 37th Conference on Neural Information Processing Systems

Date Published:: 2023-12-01

Journal Name:: 37th Conference on Neural Information Processing Systems

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this