Policy Learning Using Weak Supervision

Wang, Jingkang; Guo, Hongyi; Zhu, Zhaowei; Liu, Yang

Citation Details

Most existing policy learning solutions require the learning agents to receive high-quality supervision signals, e.g., rewards in reinforcement learning (RL) or high-quality expert demonstrations in behavioral cloning (BC). These quality supervisions are either infeasible or prohibitively expensive to obtain in practice. We aim for a unified framework that leverages the available cheap weak supervisions to perform policy learning efficiently. To handle this problem, we treat the weak supervision'' as imperfect information coming from a peer agent, and evaluate the learning agent's policy based on a correlated agreement'' with the peer agent's policy (instead of simple agreements). Our approach explicitly punishes a policy for overfitting to the weak supervision. In addition to theoretical guarantees, extensive evaluations on tasks including RL with noisy reward, BC with weak demonstrations, and standard policy co-training (RL + BC) show that our method leads to substantial performance improvements, especially when the complexity or the noise of the learning environments is high. more »

Award ID(s):: 2007951

PAR ID:: 10391568

Author(s) / Creator(s):: Wang, Jingkang; Guo, Hongyi; Zhu, Zhaowei; Liu, Yang

Date Published:: 2021-01-01

Journal Name:: Advances in neural information processing systems

ISSN:: 1049-5258

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this