Tracking the 6D pose of objects in video sequences is important for robot manipulation. This task, however, in- troduces multiple challenges: (i) robot manipulation involves significant occlusions; (ii) data and annotations are troublesome and difficult to collect for 6D poses, which complicates machine learning solutions, and (iii) incremental error drift often accu- mulates in long term tracking to necessitate re-initialization of the object’s pose. This work proposes a data-driven opti- mization approach for long-term, 6D pose tracking. It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object’s model. The key contribution in this context is a novel neural network architecture, which appropriately disentangles the feature encoding to help reduce domain shift, and an effective 3D orientation representation via Lie Algebra. Consequently, even when the network is trained only with synthetic data can work effectively over real images. Comprehensive experiments over benchmarks - existing ones as well as a new dataset with significant occlusions related to object manipulation - show that the proposed approach achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images. The approach is also the most computationally efficient among the alternatives and achieves a tracking frequency of 90.9Hz.
more »
« less
DeepURL: Deep Pose Estimation Framework for Underwater Relative Localization
In this paper, we propose a real-time deep-learning approach for determining the 6D relative pose of Autonomous Underwater Vehicles (AUV) from a single image. A team of autonomous robots localizing themselves, in a communicationconstrained underwater environment, is essential for many applications such as underwater exploration, mapping, multirobot convoying, and other multi-robot tasks. Due to the profound difficulty of collecting ground truth images with accurate 6D poses underwater, this work utilizes rendered images from the Unreal Game Engine simulation for training. An image translation network is employed to bridge the gap between the rendered and the real images producing synthetic images for training. The proposed method predicts the 6D pose of an AUV from a single image as 2D image keypoints representing 8 corners of the 3D model of the AUV, and then the 6D pose in the camera coordinates is determined using RANSACbased PnP. Experimental results in underwater environments (swimming pool and ocean) with different cameras demonstrate the robustness of the proposed technique, where the trained system decreased translation error by 75.5\% and orientation error by 64.6\% over the state-of-the-art methods.
more »
« less
- PAR ID:
- 10215299
- Date Published:
- Journal Name:
- Proceedings of the IEEERSJ International Conference on Intelligent Robots and Systems
- ISSN:
- 2153-0866
- Page Range / eLocation ID:
- 1777-1784
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
A 6D human pose estimation method is studied to assist autonomous UAV control in human environments. As autonomous robots/UAVs become increasingly prevalent in the future workspace, autonomous robots must detect/estimate human movement and predict their trajectory to plan a safe motion path. Our method utilize a deep Convolutional Neural Network to calculate a 3D torso bounding box to determine the location and orientation of human objects. The training uses a loss function that includes both 3D angle and translation errors. The trained model delivers <10-degree angular error and outperforms a reference method based on RSN.more » « less
-
Accurate pose estimation is often a requirement for robust robotic grasping and manipulation of objects placed in cluttered, tight environments, such as a shelf with multiple objects. When deep learning approaches are employed to perform this task, they typically require a large amount of training data. However, obtaining precise 6 degrees of freedom for ground-truth can be prohibitively expensive. This work therefore proposes an architecture and a training process to solve this issue. More precisely, we present a weak object detector that enables localizing objects and estimating their 6D poses in cluttered and occluded scenes. To minimize the human labor required for annotations, the proposed detector is trained with a combination of synthetic and a few weakly annotated real images (as little as 10 images per object), for which a human provides only a list of objects present in each image (no time-consuming annotations, such as bounding boxes, segmentation masks and object poses). To close the gap between real and synthetic images, we use multiple domain classifiers trained adversarially. During the inference phase, the resulting class-specific heatmaps of the weak detector are used to guide the search of 6D poses of objects. Our proposed approach is evaluated on several publicly available datasets for pose estimation. We also evaluated our model on classification and localization in unsupervised and semi-supervised settings. The results clearly indicate that this approach could provide an efficient way toward fully automating the training process of computer vision models used in robotics.more » « less
-
We present a parallelized optimization method based on fast Neural Radiance Fields (NeRF) for estimating 6-DoF pose of a camera with respect to an object or scene. Given a single observed RGB image of the target, we can predict the translation and rotation of the camera by minimizing the residual between pixels rendered from a fast NeRF model and pixels in the observed image. We integrate a momentum-based camera extrinsic optimization procedure into Instant Neural Graphics Primitives, a recent exceptionally fast NeRF implementation. By introducing parallel Monte Carlo sampling into the pose estimation task, our method overcomes local minima and improves efficiency in a more extensive search space. We also show the importance of adopting a more robust pixel-based loss function to reduce error. Experiments demonstrate that our method can achieve improved generalization and robustness on both synthetic and real-world benchmarks.more » « less
-
Monitoring coral reef populations as part of environmental assessment is essential. Recently, many marine science researchers are employing low-cost and power efficient Autonomous Underwater Vehicles (AUV) to survey coral reefs. While the counting problem, in general, has rich literature, little work has focused on estimating the density of coral population using AUVs. This paper proposes a novel approach to identify, count, and estimate coral populations. A Convolutional Neural Network (CNN) is utilized to detect and identify the different corals, and a tracking mechanism provides a total count for each coral species per transect. Experimental results from an Aqua2 underwater robot and a stereo hand-held camera validated the proposed approach for different image qualities.more » « less