Enhanced Meta Reinforcement Learning using Demonstrations in Sparse Reward Environments

Rengarajan, Desik; Chaudhary, Sapana; Jaewon, Kim; Kalathil, Dileep; Shakkottai, Srinivas

Citation Details

Meta reinforcement learning (Meta-RL) is an approach wherein the experience gained from solving a variety of tasks is distilled into a meta-policy. The metapolicy, when adapted over only a small (or just a single) number of steps, is able to perform near-optimally on a new, related task. However, a major challenge to adopting this approach to solve real-world problems is that they are often associated with sparse reward functions that only indicate whether a task is completed partially or fully. We consider the situation where some data, possibly generated by a suboptimal agent, is available for each task. We then develop a class of algorithms entitled Enhanced Meta-RL using Demonstrations (EMRLD) that exploit this information—even if sub-optimal—to obtain guidance during training. We show how EMRLD jointly utilizes RL and supervised learning over the offline data to generate a meta-policy that demonstrates monotone performance improvements. We also develop a warm started variant called EMRLD-WS that is particularly efficient for sub-optimal demonstration data. Finally, we show that our EMRLD algorithms significantly outperform existing approaches in a variety of sparse reward environments, including that of a mobile robot. more »

Award ID(s):: 1955696

NSF-PAR ID:: 10472691

Author(s) / Creator(s):: Rengarajan, Desik; Chaudhary, Sapana; Jaewon, Kim; Kalathil, Dileep; Shakkottai, Srinivas

Publisher / Repository:: Curran Associates

Date Published:: 2022-11-28

Journal Name:: Advances in neural information processing systems

ISSN:: 1049-5258

Format(s):: Medium: X

Location:: New Orleans, LA

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Proceeding:
The DOI is not currently available.

More Like this