Universal Planning Networks

Srinivas, A.; Jabri, A.; Abbeel, P.; Levine, S.; Finn, C.

A key challenge in complex visuomotor control is learning abstract representations that are ef- fective for specifying goals, planning, and gen- eralization. To this end, we introduce universal planning networks (UPN). UPNs embed differen- tiable planning within a goal-directed policy. This planning computation unrolls a forward model in a latent space and infers an optimal action plan through gradient descent trajectory optimization. The plan-by-gradient-descent process and its un- derlying representations are learned end-to-end to directly optimize a supervised imitation learning objective. We find that the representations learned are not only effective for goal-directed visual imi- tation via gradient-based trajectory optimization, but can also provide a metric for specifying goals using images. The learned representations can be leveraged to specify distance-based rewards to reach new target states for model-free reinforce- ment learning, resulting in substantially more ef- fective learning when solving new tasks described via image-based goals. We were able to achieve successful transfer of visuomotor planning strate- gies across robots with significantly different mor- phologies and actuation capabilities.

More Like this