One approach to Imitation Learning is Behavior Cloning, in which a robot observes a supervisor and infers a control policy. A known problem with this “off-policy” approach is that the robot’s errors compound when drifting away from the supervisor’s demonstrations. On-policy, techniques alleviate this by iteratively collecting corrective actions for the current robot policy. However, these techniques can be tedious for human supervisors, add significant computation burden, and may visit dangerous states during training. We propose an off-policy approach that injects noise into the supervisor’s policy while demonstrating. This forces the supervisor to demonstrate how to recover from errors. We propose a new algorithm, DART (Disturbances for Augmenting Robot Trajectories), that collects demonstrations with injected noise, and optimizes the noise level to approximate the error of the robot’s trained policy during data collection. We compare DART with DAgger and Behavior Cloning in two domains: in simulation with an algorithmic supervisor on the MuJoCo tasks (Walker, Humanoid, Hopper, Half-Cheetah) and in physical experiments with human supervisors training a Toyota HSR robot to perform grasping in clutter. For high dimensional tasks like Humanoid, DART can be up to 3x faster in computation time and only decreases the supervisor’s cumulative reward by 5% during training, whereas DAgger executes policies that have 80% less cumulative reward than the supervisor. On the grasping in clutter task, DART obtains on average a 62% performance increase over Behavior Cloning.
more »
« less
One-shot Visual Imitation via Attributed Waypoints and Demonstration Augmentation
In this paper, we analyze the behavior of existing techniques and design new solutions for the problem of one-shot visual imitation. In this setting, an agent must solve a novel instance of a novel task given just a single visual demonstration. Our analysis reveals that current methods fall short because of three errors: the DAgger problem arising from purely offline training, last centimeter errors in interacting with objects, and mis-fitting to the task context rather than to the actual task. This motivates the design of our modular approach where we a) separate out task inference (what to do) from task execution (how to do it), and b) develop data augmentation and generation techniques to mitigate mis-fitting. The former allows us to leverage hand-crafted motor primitives for task execution which side-steps the DAgger problem and last centimeter errors, while the latter gets the model to focus on the task rather than the task context. Our model gets 100 and 48 success rates on two recent benchmarks, improving upon the current state-of-the-art by absolute 90 and 20 respectively.
more »
« less
- Award ID(s):
- 2007035
- PAR ID:
- 10416326
- Date Published:
- Journal Name:
- International Conference on Robotics and Automation (ICRA)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Recently, there have been several proposals to develop visual recommendation systems. The most advanced systems aim to recommend visualizations, which help users to find new correlations or identify an interesting deviation based on the current context of the user's analysis. However, when recommending a visualization to a user, there is an inherent risk to visualize random fluctuations rather than solely true patterns: a problem largely ignored by current techniques. In this paper, we present VizCertify, a novel framework to improve the performance of visual recommendation systems by quantifying the statistical significance of recommended visualizations. The proposed methodology allows to control the probability of misleading visual recommendations using both classical statistical testing procedures and a novel application of the Vapnik Chervonenkis (VC) dimension towards visualization recommendation which results in an effective criterion to decide whether a recommendation corresponds to a true phenomenon or not.more » « less
-
While the networking community has extensively tackled network design problems using optimization or other techniques (e.g., in areas such as traffic-engineering, and resource allocation), much of this work focuses on efficiently generating designs assuming well-defined objectives. In this paper, we argue that in practice, the objectives of a network design task may not be easy to specify for an architect. We argue for, and present a structured approach where the objectives of a network design task are learnt through iterative interactions with the architect. Our approach is inspired by a programming-by-examples approach that has seen success in the programming languages community. However, conventional program synthesis techniques do not apply because in our context a user can only provide a relative comparison between multiple choices on which one is more desirable, rather than provide an exact output for a given input. We propose a novel comparative synthesis approach to tackle these challenges. We sketch the approach, present promising preliminary results, and discuss future research questions.more » « less
-
Successful interaction with the environment requires the ability to flexibly allocate resources to different locations in the visual field. Recent evidence suggests that visual short-term memory (VSTM) resources are distributed asymmetrically across the visual field based upon task demands. Here, we propose that context, rather than the stimulus itself, determines asymmetrical distribution of VSTM resources. To test whether context modulates the reallocation of resources to the right visual field, task set, defined by memory-load, was manipulated to influence visual short-term memory performance. Performance was measured for single-feature objects embedded within predominantly single- or two-feature memory blocks. Therefore, context was varied to determine whether task set directly predicts changes in visual field biases. In accord with the dynamic reallocation of resources hypothesis, task set, rather than aspects of the physical stimulus, drove improvements in performance in the right- visual field. Our results show, for the first time, that preparation for upcoming memory demands directly determines how resources are allocated across the visual field.more » « less
-
null (Ed.)This paper proposes a new approach for debugging errors in floating point computation by performing shadow execution with higher precision in parallel. The programmer specifies parts of the program that need to be debugged for errors. Our compiler creates shadow execution tasks, which execute on different cores and perform the computation with higher precision. We propose a novel method to execute a shadow execution task from an arbitrary memory state, which is necessary because we are creating a parallel shadow execution from a sequential program. Our approach also ensures that the shadow execution follows the same control flow path as the original program. Our runtime automatically distributes the shadow execution tasks to balance the load on the cores. Our prototype for parallel shadow execution, PFPSanitizer, provides comprehensive detection of errors while having lower performance overheads than prior approaches.more » « less
An official website of the United States government

