Reward signals in reinforcement learning are expensive to design and often require access to the true state which is not available in the real world. Common alternatives are usually demonstrations or goal images which can be labor-intensive to collect. On the other hand, text descriptions provide a general, natural, and low-effort way of communicating the desired task. However, prior works in learning text-conditioned policies still rely on rewards that are defined using either true state or labeled expert demonstrations. We use recent developments in building large-scale visuolanguage models like CLIP to devise a framework that generates the task reward signal just from goal text description and raw pixel observations which is then used to learn the task policy. We evaluate the proposed framework on control and robotic manipulation tasks. Finally, we distill the individual task policies into a single goal text conditioned policy that can generalize in a zero-shot manner to new tasks with unseen objects and unseen goal text descriptions.
more »
« less
Adapting by Analogy: OOD Generalization of Visuomotor Policies via Functional Correspondence
End-to-end visuomotor policies trained using behavior cloning have shown a remarkable ability to generate complex, multi-modal low-level robot behaviors. However, at deployment time, these policies still struggle to act reliably when faced with out-of-distribution (OOD) visuals induced by objects, backgrounds, or environment changes. Prior works in interactive imitation learning solicit corrective expert demonstrations under the OOD conditions -- but this can be costly and inefficient. We observe that task success under OOD conditions does not always warrant novel robot behaviors. In-distribution (ID) behaviors can directly be transferred to OOD conditions that share functional similarities with ID conditions. For example, behaviors trained to interact with in-distribution (ID) pens can apply to interacting with a visually-OOD pencil. The key challenge lies in disambiguating which ID observations functionally correspond to the OOD observation for the task at hand. We propose that an expert can provide this OOD-to-ID functional correspondence. Thus, instead of collecting new demonstrations and re-training at every OOD encounter, our method: (1) detects the need for feedback by first checking if current observations are OOD and then identifying whether the most similar training observations show divergent behaviors, (2) solicits functional correspondence feedback to disambiguate between those behaviors, and (3) intervenes on the OOD observations with the functionally corresponding ID observations to perform deployment-time generalization. We validate our method across diverse real-world robotic manipulation tasks with a Franka Panda robotic manipulator. Our results show that test-time functional correspondences can improve the generalization of a vision-based diffusion policy to OOD objects and environment conditions with low feedback.
more »
« less
- Award ID(s):
- 2441014
- PAR ID:
- 10665478
- Publisher / Repository:
- Conference on Robot Learning
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Robotic manipulation of deformable 1D objects such as ropes, cables, and hoses is challenging due to the lack of high-fidelity analytic models and large configuration spaces. Furthermore, learning end-to-end manipulation policies directly from images and physical interaction requires significant time on a robot and can fail to generalize across tasks. We address these challenges using interpretable deep visual representations for rope, extending recent work on dense object descriptors for robot manipulation. This facilitates the design of interpretable and transferable geometric policies built on top of the learned representations, decoupling visual reasoning and control. We present an approach that learns point-pair correspondences between initial and goal rope configurations, which implicitly encodes geometric structure, entirely in simulation from synthetic depth images. We demonstrate that the learned representation - dense depth object descriptors (DDODs) - can be used to manipulate a real rope into a variety of different arrangements either by learning from demonstrations or using interpretable geometric policies. In 50 trials of a knot-tying task with the ABB YuMi Robot, the system achieves a 66% knot-tying success rate from previously unseen configurations. See https://tinyurl.com/rope-learning for supplementary material and videos.more » « less
-
Methods which utilize the outputs or feature representations of predictive models have emerged as promising approaches for out-of-distribution (OOD) detection of image inputs. However, these methods struggle to detect OOD inputs that share nuisance values (e.g. background) with in-distribution inputs. The detection of shared-nuisance out-of-distribution (SN-OOD) inputs is particularly relevant in real-world applications, as anomalies and in-distribution inputs tend to be captured in the same settings during deployment. In this work, we provide a possible explanation for SN-OOD detection failures and propose nuisance-aware OOD detection to address them. Nuisance-aware OOD detection substitutes a classifier trained via empirical risk minimization and cross-entropy loss with one that 1. is trained under a distribution where the nuisance-label relationship is broken and 2. yields representations that are independent of the nuisance under this distribution, both marginally and conditioned on the label. We can train a classifier to achieve these objectives using Nuisance-Randomized Distillation (NuRD), an algorithm developed for OOD generalization under spurious correlations. Output- and feature-based nuisance-aware OOD detection perform substantially better than their original counterparts, succeeding even when detection based on domain generalization algorithms fails to improve performance.more » « less
-
Adebisi, John (Ed.)Non-expert users can now program robots using various end-user robot programming methods, which have widened the use of robots and lowered barriers preventing robot use by laypeople. Kinesthetic teaching is a common form of end-user robot programming, allowing users to forgo writing code by physically guiding the robot to demonstrate behaviors. Although it can be more accessible than writing code, kinesthetic teaching is difficult in practice because of users’ unfamiliarity with kinematics or limitations of robots and programming interfaces. Developing good kinesthetic demonstrations requires physical and cognitive skills, such as the ability to plan effective grasps for different task objects and constraints, to overcome programming difficulties. How to help users learn these skills remains a largely unexplored question, with users conventionally learning through self-guided practice. Our study compares how self-guided practice compares with curriculum-based training in building users’ programming proficiency. While we found no significant differences between study participants who learned through practice compared to participants who learned through our curriculum, our study reveals insights into factors contributing to end-user robot programmers’ confidence and success during programming and how learning interventions may contribute to such factors. Our work paves the way for further research on how to best structure training interventions for end-user robot programmers.more » « less
-
The field of end-user robot programming seeks to develop methods that empower non-expert programmers to task and modify robot operations. In doing so, researchers may enhance robot flexibility and broaden the scope of robot deployments into the real world. We introduce PRogramAR (Programming Robots using Augmented Reality), a novel end-user robot programming system that combines the intuitive visual feedback of augmented reality (AR) with the simplistic and responsive paradigm of trigger-action programming (TAP) to facilitate human-robot collaboration. Through PRogramAR, users are able to rapidly author task rules and desired reactive robot behaviors, while specifying task constraints and observing program feedback contextualized directly in the real world. PRogramAR provides feedback by simulating the robot’s intended behavior and providing instant evaluation of TAP rule executability to help end users better understand and debug their programs during development. In a system validation, 17 end users ranging from ages 18 to 83 used PRogramAR to program a robot to assist them in completing three collaborative tasks. Our results demonstrate how merging the benefits of AR and TAP using elements from prior robot programming research into a single novel system can successfully enhance the robot programming process for non-expert users.more » « less
An official website of the United States government

