Self-Adaptive Imitation Learning: Learning Tasks with Delayed Rewards from Sub-optimal Demonstrations

Zhu, Zhuangdi; Lin, Kaixiang; Dai, Bo; Zhou, Jiayu

doi:10.1609/aaai.v36i8.20914

Citation Details

Self-Adaptive Imitation Learning: Learning Tasks with Delayed Rewards from Sub-optimal Demonstrations

Reinforcement learning (RL) has demonstrated its superiority in solving sequential decision-making problems. However, heavy dependence on immediate reward feedback impedes the wide application of RL. On the other hand, imitation learning (IL) tackles RL without relying on environmental supervision by leveraging external demonstrations. In practice, however, collecting sufficient expert demonstrations can be prohibitively expensive, yet the quality of demonstrations typically limits the performance of the learning policy. To address a practical scenario, in this work, we propose Self-Adaptive Imitation Learning (SAIL), which, provided with a few demonstrations from a sub-optimal teacher, can perform well in RL tasks with extremely delayed rewards, where the only reward feedback is trajectory-wise ranking. SAIL bridges the advantages of IL and RL by interactively exploiting the demonstrations to catch up with the teacher and exploring the environment to yield demonstrations that surpass the teacher. Extensive empirical results show that not only does SAIL significantly improve the sample efficiency, but it also leads to higher asymptotic performance across different continuous control tasks, compared with the state-of-the-art. more »

Award ID(s):: 1749940

PAR ID:: 10342092

Author(s) / Creator(s):: Zhu, Zhuangdi; Lin, Kaixiang; Dai, Bo; Zhou, Jiayu

Date Published:: 2022-06-30

Journal Name:: Proceedings of the AAAI Conference on Artificial Intelligence

Volume:: 36

Issue:: 8

ISSN:: 2159-5399

Page Range / eLocation ID:: 9269 to 9277

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1609/aaai.v36i8.20914

More Like this