Semantic Visual Navigation by Watching YouTube Videos

Chang, Matthew; Gupta, Arjun; Gupta, Saurabh

Citation Details

Semantic cues and statistical regularities in real-world environment layouts can improve efficiency for navigation in novel environments. This paper learns and leverages such semantic cues for navigating to objects of interest in novel environments, by simply watching YouTube videos. This is challenging because YouTube videos do not come with labels for actions or goals, and may not even showcase optimal behavior. Our method tackles these challenges through the use of Q-learning on pseudo-labeled transition quadruples (image, action, next image, reward). We show that such off-policy Q-learning from passive data is able to learn meaningful semantic cues for navigation. These cues, when used in a hierarchical navigation policy, lead to improved efficiency at the ObjectGoal task in visually realistic simulations. We observe a relative improvement of 15-83% over end-to-end RL, behavior cloning, and classical methods, while using minimal direct interaction. more »

Award ID(s):: 2007035

PAR ID:: 10416322

Author(s) / Creator(s):: Chang, Matthew; Gupta, Arjun; Gupta, Saurabh

Date Published:: 2020-12-01

Journal Name:: Neural Information Processing Systems (NeurIPS), 2020

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this