skip to main content

Title: Robust, Occlusion-aware Pose Estimation for Objects Grasped by Adaptive Hands
Many manipulation tasks, such as placement or within-hand manipulation, require the object’s pose relative to a robot hand. The task is difficult when the hand significantly occludes the object. It is especially hard for adaptive hands, for which it is not easy to detect the finger’s configuration. In addition, RGB-only approaches face issues with texture-less objects or when the hand and the object look similar. This paper presents a depth-based framework, which aims for robust pose estimation and short response times. The approach detects the adaptive hand’s state via efficient parallel search given the highest overlap between the hand’s model and the point cloud. The hand’s point cloud is pruned and robust global registration is performed to generate object pose hypotheses, which are clustered. False hypotheses are pruned via physical reasoning. The remaining poses’ quality is evaluated given agreement with observed data. Extensive evaluation on synthetic and real data demonstrates the accuracy and computational efficiency of the framework when applied on challenging, highly-occluded scenarios for different object types. An ablation study identifies how the framework’s components help in performance. This work also provides a dataset for in-hand 6D object pose esti- mation. Code and dataset are available at: https://github. com/wenbowen123/icra20-hand-object-pose
Authors:
; ; ; ; ;
Award ID(s):
1734492 1723869 1934924
Publication Date:
NSF-PAR ID:
10191546
Journal Name:
IEEE International Conference on Robotics and Automation (ICRA)
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider the problem of in-hand dexterous manipulation with a focus on unknown or uncertain hand–object parameters, such as hand configuration, object pose within hand, and contact positions. In particular, in this work we formulate a generic framework for hand–object configuration estimation using underactuated hands as an example. Owing to the passive reconfigurability and the lack of encoders in the hand’s joints, it is challenging to estimate, plan, and actively control underactuated manipulation. By modeling the grasp constraints, we present a particle filter-based framework to estimate the hand configuration. Specifically, given an arbitrary grasp, we start by sampling a set of hand configuration hypotheses and then randomly manipulate the object within the hand. While observing the object’s movements as evidence using an external camera, which is not necessarily calibrated with the hand frame, our estimator calculates the likelihood of each hypothesis to iteratively estimate the hand configuration. Once converged, the estimator is used to track the hand configuration in real time for future manipulations. Thereafter, we develop an algorithm to precisely plan and control the underactuated manipulation to move the grasped object to desired poses. In contrast to most other dexterous manipulation approaches, our framework does not require any tactilemore »sensing or joint encoders, and can directly operate on any novel objects, without requiring a model of the object a priori. We implemented our framework on both the Yale Model O hand and the Yale T42 hand. The results show that the estimation is accurate for different objects, and that the framework can be easily adapted across different underactuated hand models. In the end, we evaluated our planning and control algorithm with handwriting tasks, and demonstrated the effectiveness of the proposed framework.« less
  2. Tracking the 6D pose of objects in video sequences is important for robot manipulation. This task, however, in- troduces multiple challenges: (i) robot manipulation involves significant occlusions; (ii) data and annotations are troublesome and difficult to collect for 6D poses, which complicates machine learning solutions, and (iii) incremental error drift often accu- mulates in long term tracking to necessitate re-initialization of the object’s pose. This work proposes a data-driven opti- mization approach for long-term, 6D pose tracking. It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object’s model. The key contribution in this context is a novel neural network architecture, which appropriately disentangles the feature encoding to help reduce domain shift, and an effective 3D orientation representation via Lie Algebra. Consequently, even when the network is trained only with synthetic data can work effectively over real images. Comprehensive experiments over benchmarks - existing ones as well as a new dataset with significant occlusions related to object manipulation - show that the proposed approach achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images. The approach is also themore »most computationally efficient among the alternatives and achieves a tracking frequency of 90.9Hz.« less
  3. This paper introduces key machine learning operations that allow the realization of robust, joint 6D pose estimation of multiple instances of objects either densely packed or in unstructured piles from RGB-D data. The first objective is to learn semantic and instance-boundary detectors without manual labeling. An adversarial training framework in conjunction with physics-based simulation is used to achieve detectors that behave similarly in synthetic and real data. Given the stochastic output of such detectors, candidates for object poses are sampled. The second objective is to automatically learn a single score for each pose candidate that represents its quality in terms of explaining the entire scene via a gradient boosted tree. The proposed method uses features derived from surface and boundary alignment between the observed scene and the object model placed at hypothesized poses. Scene-level, multi-instance pose estimation is then achieved by an integer linear programming process that selects hypotheses that maximize the sum of the learned individual scores, while respecting constraints, such as avoiding collisions. To evaluate this method, a dataset of densely packed objects with challenging setups for state-of-the-art approaches is collected. Experiments on this dataset and a public one show that the method significantly outperforms alternatives in termsmore »of 6D pose accuracy while trained only with synthetic datasets.« less
  4. The process of modeling a series of hand-object parameters is crucial for precise and controllable robotic in-hand manipulation because it enables the mapping from the hand’s actuation input to the object’s motion to be obtained. Without assuming that most of these model parameters are known a priori or can be easily estimated by sensors, we focus on equipping robots with the ability to actively self-identify necessary model parameters using minimal sensing. Here, we derive algorithms, on the basis of the concept of virtual linkage-based representations (VLRs), to self-identify the underlying mechanics of hand-object systems via exploratory manipulation actions and probabilistic reasoning and, in turn, show that the self-identified VLR can enable the control of precise in-hand manipulation. To validate our framework, we instantiated the proposed system on a Yale Model O hand without joint encoders or tactile sensors. The passive adaptability of the underactuated hand greatly facilitates the self-identification process, because they naturally secure stable hand-object interactions during random exploration. Relying solely on an in-hand camera, our system can effectively self-identify the VLRs, even when some fingers are replaced with novel designs. In addition, we show in-hand manipulation applications of handwriting, marble maze playing, and cup stacking to demonstrate themore »effectiveness of the VLR in precise in-hand manipulation control.

    « less
  5. This work proposes a framework for tracking a desired path of an object held by an adaptive hand via within-hand manipulation. Such underactuated hands are able to passively achieve stable contacts with objects. Combined with vision-based control and data-driven state estimation process, they can solve tasks without accurate hand-object models or multi-modal sensory feedback. In particular, a data-driven regression process is used here to estimate the probability of dropping the object for given manipulation states. Then, an optimization-based planner aims to track the desired path while avoiding states that are above a threshold probability of dropping the object. The optimized cost function, based on the principle of Dynamic-Time Warping (DTW), seeks to minimize the area between the desired and the followed path. By adapting the threshold for the probability of dropping the object, the framework can handle objects of different weights without retraining. Experiments involving writing letters with a marker, as well as tracing randomized paths, were conducted on the Yale Model T-42 hand. Results indicate that the framework successfully avoids undesirable states, while minimizing the proposed cost function, thereby producing object paths for within-hand manipulation that closely match the target ones.