Reward Learning from Suboptimal Demonstrations with Applications in Surgical Electrocautery

Karimi, Zohre; Ho, Shing-Hei; Thach, Bao; Kuntz, Alan; Brown, Daniel S

Citation Details

Automating robotic surgery via learning from demonstration (LfD) techniques is extremely challenging. This is because surgical tasks often involve sequential decisionmaking processes with complex interactions of physical objects and have low tolerance for mistakes. Prior works assume that all demonstrations are fully observable and optimal, which might not be practical in the real world. This paper introduces a sample-efficient method that learns a robust reward function from a limited amount of ranked suboptimal demonstrations consisting of partial-view point cloud observations. The method then learns a policy by optimizing the learned reward function using reinforcement learning (RL). We show that using a learned reward function to obtain a policy is more robust than pure imitation learning. We apply our approach on a physical surgical electrocautery task and demonstrate that our method can perform well even when the provided demonstrations are suboptimal and the observations are highdimensional point clouds. more »

Award ID(s):: 2133027

PAR ID:: 10547468

Author(s) / Creator(s):: Karimi, Zohre; Ho, Shing-Hei; Thach, Bao; Kuntz, Alan; Brown, Daniel S

Publisher / Repository:: International Symposium on Medical Robotics (ISMR)

Date Published:: 2024-06-03

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this