Reward Learning With Intractable Normalizing Functions

Hoegerman, Joshua; Losey, Dylan

doi:10.1109/LRA.2023.3320009

Citation Details

Reward Learning With Intractable Normalizing Functions

Robots can learn to imitate humans by inferring what the human is optimizing for. One common framework for this is Bayesian reward learning, where the robot treats the human's demonstrations and corrections as observations of their underlying reward function. Unfortunately, this inference is doubly-intractable: the robot must reason over all the trajectories the person could have provided and all the rewards the person could have in mind. Prior work uses existing robotic tools to approximate this normalizer. In this letter, we group previous approaches into three fundamental classes and analyze the theoretical pros and cons of their approach. We then leverage recent research from the statistics community to introduce Double MH reward learning, a Monte Carlo method for asymptotically learning the human's reward in continuous spaces. We extend Double MH to conditionally independent settings (where each human correction is viewed as completely separate) and conditionally dependent environments (where the human's current correction may build on previous inputs). Across simulations and user studies, our proposed approach infers the human's reward parameters more accurately than the alternate approximations when learning from either demonstrations or corrections. more »

Award ID(s):: 2222468

PAR ID:: 10494587

Author(s) / Creator(s):: Hoegerman, Joshua; Losey, Dylan

Publisher / Repository:: IEEE

Date Published:: 2023-11-01

Journal Name:: IEEE Robotics and Automation Letters

Volume:: 8

Issue:: 11

ISSN:: 2377-3774

Page Range / eLocation ID:: 7511 to 7518

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1109/LRA.2023.3320009

More Like this