skip to main content

Title: Multi-Modal Geometric Learning for Grasping and Manipulation
This work provides an architecture that incorporates depth and tactile information to create rich and accurate 3D models useful for robotic manipulation tasks. This is accomplished through the use of a 3D convolutional neural network (CNN). Offline, the network is provided with both depth and tactile information and trained to predict the object’s geometry, thus filling in regions of occlusion. At runtime, the network is provided a partial view of an object. Tactile information is acquired to augment the captured depth information. The network can then reason about the object’s geometry by utilizing both the collected tactile and depth information. We demonstrate that even small amounts of additional tactile information can be incredibly helpful in reasoning about object geometry. This is particularly true when information from depth alone fails to produce an accurate geometric prediction. Our method is benchmarked against and outperforms other visual-tactile approaches to general geometric reasoning. We also provide experimental results comparing grasping success with our method.
; ;
Award ID(s):
Publication Date:
Journal Name:
2019 International Conference on Robotics and Automation (ICRA)
Page Range or eLocation-ID:
7339 to 7345
Sponsoring Org:
National Science Foundation
More Like this
  1. We present 3DVNet, a novel multi-view stereo (MVS) depth-prediction method that combines the advantages of previous depth-based and volumetric MVS approaches. Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions, resulting in highly accurate predictions which agree on the underlying scene geometry. Unlike existing depth-prediction techniques, our method uses a volumetric 3D convolutional neural network (CNN) that operates in world space on all depth maps jointly. The network can therefore learn meaningful scene-level priors. Furthermore, unlike existing volumetric MVS techniques, our 3D CNN operates on a feature-augmented point cloud,more »allowing for effective aggregation of multi-view information and flexible iterative refinement of depth maps. Experimental results show our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics on the ScanNet dataset, as well as a selection of scenes from the TUM-RGBD and ICL-NUIM datasets. This shows that our method is both effective and generalizes to new settings.« less
  2. We consider the problem of in-hand dexterous manipulation with a focus on unknown or uncertain hand–object parameters, such as hand configuration, object pose within hand, and contact positions. In particular, in this work we formulate a generic framework for hand–object configuration estimation using underactuated hands as an example. Owing to the passive reconfigurability and the lack of encoders in the hand’s joints, it is challenging to estimate, plan, and actively control underactuated manipulation. By modeling the grasp constraints, we present a particle filter-based framework to estimate the hand configuration. Specifically, given an arbitrary grasp, we start by sampling a setmore »of hand configuration hypotheses and then randomly manipulate the object within the hand. While observing the object’s movements as evidence using an external camera, which is not necessarily calibrated with the hand frame, our estimator calculates the likelihood of each hypothesis to iteratively estimate the hand configuration. Once converged, the estimator is used to track the hand configuration in real time for future manipulations. Thereafter, we develop an algorithm to precisely plan and control the underactuated manipulation to move the grasped object to desired poses. In contrast to most other dexterous manipulation approaches, our framework does not require any tactile sensing or joint encoders, and can directly operate on any novel objects, without requiring a model of the object a priori. We implemented our framework on both the Yale Model O hand and the Yale T42 hand. The results show that the estimation is accurate for different objects, and that the framework can be easily adapted across different underactuated hand models. In the end, we evaluated our planning and control algorithm with handwriting tasks, and demonstrated the effectiveness of the proposed framework.« less
  3. The process of modeling a series of hand-object parameters is crucial for precise and controllable robotic in-hand manipulation because it enables the mapping from the hand’s actuation input to the object’s motion to be obtained. Without assuming that most of these model parameters are known a priori or can be easily estimated by sensors, we focus on equipping robots with the ability to actively self-identify necessary model parameters using minimal sensing. Here, we derive algorithms, on the basis of the concept of virtual linkage-based representations (VLRs), to self-identify the underlying mechanics of hand-object systems via exploratory manipulation actions and probabilisticmore »reasoning and, in turn, show that the self-identified VLR can enable the control of precise in-hand manipulation. To validate our framework, we instantiated the proposed system on a Yale Model O hand without joint encoders or tactile sensors. The passive adaptability of the underactuated hand greatly facilitates the self-identification process, because they naturally secure stable hand-object interactions during random exploration. Relying solely on an in-hand camera, our system can effectively self-identify the VLRs, even when some fingers are replaced with novel designs. In addition, we show in-hand manipulation applications of handwriting, marble maze playing, and cup stacking to demonstrate the effectiveness of the VLR in precise in-hand manipulation control.

    « less
  4. Abstract Background 3D imaging, such as X-ray CT and MRI, has been widely deployed to study plant root structures. Many computational tools exist to extract coarse-grained features from 3D root images, such as total volume, root number and total root length. However, methods that can accurately and efficiently compute fine-grained root traits, such as root number and geometry at each hierarchy level, are still lacking. These traits would allow biologists to gain deeper insights into the root system architecture. Results We present TopoRoot, a high-throughput computational method that computes fine-grained architectural traits from 3D images of maize root crowns ormore »root systems. These traits include the number, length, thickness, angle, tortuosity, and number of children for the roots at each level of the hierarchy. TopoRoot combines state-of-the-art algorithms in computer graphics, such as topological simplification and geometric skeletonization, with customized heuristics for robustly obtaining the branching structure and hierarchical information. TopoRoot is validated on both CT scans of excavated field-grown root crowns and simulated images of root systems, and in both cases, it was shown to improve the accuracy of traits over existing methods. TopoRoot runs within a few minutes on a desktop workstation for images at the resolution range of 400^3, with minimal need for human intervention in the form of setting three intensity thresholds per image. Conclusions TopoRoot improves the state-of-the-art methods in obtaining more accurate and comprehensive fine-grained traits of maize roots from 3D imaging. The automation and efficiency make TopoRoot suitable for batch processing on large numbers of root images. Our method is thus useful for phenomic studies aimed at finding the genetic basis behind root system architecture and the subsequent development of more productive crops.« less
  5. 3D LiDAR scanners are playing an increasingly important role in autonomous driving as they can generate depth information of the environment. However, creating large 3D LiDAR point cloud datasets with point-level labels requires a significant amount of manual annotation. This jeopardizes the efficient development of supervised deep learning algorithms which are often data-hungry. We present a framework to rapidly create point clouds with accurate pointlevel labels from a computer game. To our best knowledge, this is the first publication on LiDAR point cloud simulation framework for autonomous driving. The framework supports data collection from both auto-driving scenes and user-configured scenes.more »Point clouds from auto-driving scenes can be used as training data for deep learning algorithms, while point clouds from user-configured scenes can be used to systematically test the vulnerability of a neural network, and use the falsifying examples to make the neural network more robust through retraining. In addition, the scene images can be captured simultaneously in order for sensor fusion tasks, with a method proposed to do automatic registration between the point clouds and captured scene images. We show a significant improvement in accuracy (+9%) in point cloud segmentation by augmenting the training dataset with the generated synthesized data. Our experiments also show by testing and retraining the network using point clouds from user-configured scenes, the weakness/blind spots of the neural network can be fixed.« less