skip to main content

Title: Learning Deep Policies for Robot Bin Picking by Simulating Robust Grasping Sequences
Recent results suggest that it is possible to grasp a variety of singu- lated objects with high precision using Convolutional Neural Networks (CNNs) trained on synthetic data. This paper considers the task of bin picking, where multiple objects are randomly arranged in a heap and the objective is to sequen- tially grasp and transport each into a packing box. We model bin picking with a discrete-time Partially Observable Markov Decision Process that specifies states of the heap, point cloud observations, and rewards. We collect synthetic demon- strations of bin picking from an algorithmic supervisor uses full state information to optimize for the most robust collision-free grasp in a forward simulator based on pybullet to model dynamic object-object interactions and robust wrench space analysis from the Dexterity Network (Dex-Net) to model quasi-static contact be- tween the gripper and object. We learn a policy by fine-tuning a Grasp Quality CNN on Dex-Net 2.1 to classify the supervisor’s actions from a dataset of 10,000 rollouts of the supervisor in the simulator with noise injection. In 2,192 physical trials of bin picking with an ABB YuMi on a dataset of 50 novel objects, we find that the resulting policies can achieve 94% success rate more » and 96% average preci- sion (very few false positives) on heaps of 5-10 objects and can clear heaps of 10 objects in under three minutes. Datasets, experiments, and supplemental material are available at « less
Award ID(s):
Publication Date:
Journal Name:
Sponsoring Org:
National Science Foundation
More Like this
  1. Vacuum-based end effectors are widely used in in- dustry and are often preferred over parallel-jaw and multifinger grippers due to their ability to lift objects with a single point of contact. Suction grasp planners often target planar surfaces on point clouds near the estimated centroid of an object. In this paper, we propose a compliant suction contact model that computes the quality of the seal between the suction cup and local target surface and a measure of the ability of the suction grasp to resist an external gravity wrench. To characterize grasps, we estimate robustness to perturbations in end-effector and object pose, material properties, and external wrenches. We analyze grasps across 1,500 3D object models to generate Dex- Net 3.0, a dataset of 2.8 million point clouds, suction grasps, and grasp robustness labels. We use Dex-Net 3.0 to train a Grasp Quality Convolutional Neural Network (GQ-CNN) to classify robust suction targets in point clouds containing a single object. We evaluate the resulting system in 350 physical trials on an ABB YuMi fitted with a pneumatic suction gripper. When eval- uated on novel objects that we categorize as Basic (prismatic or cylindrical), Typical (more complex geometry), and Adversarial (with few availablemore »suction-grasp points) Dex-Net 3.0 achieves success rates of 98%, 82%, and 58% respectively, improving to 81% in the latter case when the training set includes only adversarial objects. Code, datasets, and supplemental material can be found at« less
  2. There has been significant recent work on data-driven algorithms for learning general-purpose grasping policies. However, these policies can consis- tently fail to grasp challenging objects which are significantly out of the distribution of objects in the training data or which have very few high quality grasps. Moti- vated by such objects, we propose a novel problem setting, Exploratory Grasping, for efficiently discovering reliable grasps on an unknown polyhedral object via sequential grasping, releasing, and toppling. We formalize Exploratory Grasping as a Markov Decision Process where we assume that the robot can (1) distinguish stable poses of a polyhedral object of unknown geometry, (2) generate grasp can- didates on these poses and execute them, (3) determine whether each grasp is successful, and (4) release the object into a random new pose after a grasp success or topple the object after a grasp failure. We study the theoretical complexity of Exploratory Grasping in the context of reinforcement learning and present an efficient bandit-style algorithm, Bandits for Online Rapid Grasp Exploration Strategy (BORGES), which leverages the structure of the problem to efficiently discover high performing grasps for each object stable pose. BORGES can be used to complement any general-purpose grasping algorithm with anymore »grasp modality (parallel-jaw, suction, multi-fingered, etc) to learn policies for objects in which they exhibit persistent failures. Simulation experiments suggest that BORGES can significantly outperform both general-purpose grasping pipelines and two other online learning algorithms and achieves performance within 5% of the optimal policy within 1000 and 8000 timesteps on average across 46 challenging objects from the Dex-Net adversarial and EGAD! object datasets, respectively. Initial physical experiments suggest that BORGES can improve grasp success rate by 45% over a Dex-Net baseline with just 200 grasp attempts in the real world. See for supplementary material and videos.« less
  3. Consumer demand for augmented reality (AR) in mobile phone applications, such as the Apple ARKit. Such applications have potential to expand access to robot grasp planning systems such as Dex-Net. AR apps use structure from motion methods to compute a point cloud from a sequence of RGB images taken by the camera as it is moved around an object. However, the resulting point clouds are often noisy due to estimation errors. We present a distributed pipeline, DexNet AR, that allows point clouds to be uploaded to a server in our lab, cleaned, and evaluated by Dex-Net grasp planner to generate a grasp axis that is returned and displayed as an overlay on the object. We implement Dex-Net AR using the iPhone and ARKit and compare results with those generated with high-performance depth sensors. The success rates with AR on harder adversarial objects are higher than traditional depth images.
  4. An option is a short-term skill consisting of a control policy for a specified region of the state space, and a termination condition recognizing leaving that region. In prior work, we proposed an algorithm called Deep Discovery of Options (DDO) to discover options to accelerate reinforcement learning in Atari games. This paper studies an extension to robot imitation learning, called Discovery of Deep Continuous Options (DDCO), where low-level continuous control skills parametrized by deep neural networks are learned from demonstrations. We extend DDO with: (1) a hybrid categorical–continuous distribution model to parametrize high-level policies that can invoke discrete options as well continuous control actions, and (2) a cross-validation method that relaxes DDO’s requirement that users specify the number of options to be discovered. We evaluate DDCO in simulation of a 3-link robot in the vertical plane pushing a block with friction and gravity, and in two physical experiments on the da Vinci surgical robot, needle insertion where a needle is grasped and inserted into a silicone tissue phantom, and needle bin picking where needles and pins are grasped from a pile and categorized into bins. In the 3-link arm simulation, results suggest that DDCO can take 3x fewer demonstrations tomore »achieve the same reward compared to a baseline imitation learning approach. In the needle insertion task, DDCO was successful 8/10 times compared to the next most accurate imitation learning baseline 6/10. In the surgical bin picking task, the learned policy successfully grasps a single object in 66 out of 99 attempted grasps, and in all but one case successfully recovered from failed grasps by retrying a second time.« less
  5. Control of reach-to-grasp movements for deft and robust interactions with objects requires rapid sensorimotor updating that enables online adjustments to changing external goals (e.g., perturbations or instability of objects we interact with). Rarely do we appreciate the remarkable coordination in reach-to-grasp, until control becomes impaired by neurological injuries such as stroke, neurodegenerative diseases, or even aging. Modeling online control of human reach-to-grasp movements is a challenging problem but fundamental to several domains, including behavioral and computational neuroscience, neurorehabilitation, neural prostheses, and robotics. Currently, there are no publicly available datasets that include online adjustment of reach-to-grasp movements to object perturbations. This work aims to advance modeling efforts of reach-to-grasp movements by making publicly available a large kinematic and EMG dataset of online adjustment of reach-to-grasp movements to instantaneous perturbations of object size and distance performed in immersive haptic-free virtual environment (hf-VE). The presented dataset is composed of a large number of perturbation types (10 for both object size and distance) applied at three different latencies after the start of the movement.