skip to main content


Title: Multi-Modal Geometric Learning for Grasping and Manipulation
This work provides an architecture that incorporates depth and tactile information to create rich and accurate 3D models useful for robotic manipulation tasks. This is accomplished through the use of a 3D convolutional neural network (CNN). Offline, the network is provided with both depth and tactile information and trained to predict the object’s geometry, thus filling in regions of occlusion. At runtime, the network is provided a partial view of an object. Tactile information is acquired to augment the captured depth information. The network can then reason about the object’s geometry by utilizing both the collected tactile and depth information. We demonstrate that even small amounts of additional tactile information can be incredibly helpful in reasoning about object geometry. This is particularly true when information from depth alone fails to produce an accurate geometric prediction. Our method is benchmarked against and outperforms other visual-tactile approaches to general geometric reasoning. We also provide experimental results comparing grasping success with our method.  more » « less
Award ID(s):
1734557
NSF-PAR ID:
10111003
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2019 International Conference on Robotics and Automation (ICRA)
Page Range / eLocation ID:
7339 to 7345
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. We present 3DVNet, a novel multi-view stereo (MVS) depth-prediction method that combines the advantages of previous depth-based and volumetric MVS approaches. Our key idea is the use of a 3D scene-modeling network that iteratively updates a set of coarse depth predictions, resulting in highly accurate predictions which agree on the underlying scene geometry. Unlike existing depth-prediction techniques, our method uses a volumetric 3D convolutional neural network (CNN) that operates in world space on all depth maps jointly. The network can therefore learn meaningful scene-level priors. Furthermore, unlike existing volumetric MVS techniques, our 3D CNN operates on a feature-augmented point cloud, allowing for effective aggregation of multi-view information and flexible iterative refinement of depth maps. Experimental results show our method exceeds state-of-the-art accuracy in both depth prediction and 3D reconstruction metrics on the ScanNet dataset, as well as a selection of scenes from the TUM-RGBD and ICL-NUIM datasets. This shows that our method is both effective and generalizes to new settings. 
    more » « less
  2. We consider the problem of in-hand dexterous manipulation with a focus on unknown or uncertain hand–object parameters, such as hand configuration, object pose within hand, and contact positions. In particular, in this work we formulate a generic framework for hand–object configuration estimation using underactuated hands as an example. Owing to the passive reconfigurability and the lack of encoders in the hand’s joints, it is challenging to estimate, plan, and actively control underactuated manipulation. By modeling the grasp constraints, we present a particle filter-based framework to estimate the hand configuration. Specifically, given an arbitrary grasp, we start by sampling a set of hand configuration hypotheses and then randomly manipulate the object within the hand. While observing the object’s movements as evidence using an external camera, which is not necessarily calibrated with the hand frame, our estimator calculates the likelihood of each hypothesis to iteratively estimate the hand configuration. Once converged, the estimator is used to track the hand configuration in real time for future manipulations. Thereafter, we develop an algorithm to precisely plan and control the underactuated manipulation to move the grasped object to desired poses. In contrast to most other dexterous manipulation approaches, our framework does not require any tactile sensing or joint encoders, and can directly operate on any novel objects, without requiring a model of the object a priori. We implemented our framework on both the Yale Model O hand and the Yale T42 hand. The results show that the estimation is accurate for different objects, and that the framework can be easily adapted across different underactuated hand models. In the end, we evaluated our planning and control algorithm with handwriting tasks, and demonstrated the effectiveness of the proposed framework. 
    more » « less
  3. The process of modeling a series of hand-object parameters is crucial for precise and controllable robotic in-hand manipulation because it enables the mapping from the hand’s actuation input to the object’s motion to be obtained. Without assuming that most of these model parameters are known a priori or can be easily estimated by sensors, we focus on equipping robots with the ability to actively self-identify necessary model parameters using minimal sensing. Here, we derive algorithms, on the basis of the concept of virtual linkage-based representations (VLRs), to self-identify the underlying mechanics of hand-object systems via exploratory manipulation actions and probabilistic reasoning and, in turn, show that the self-identified VLR can enable the control of precise in-hand manipulation. To validate our framework, we instantiated the proposed system on a Yale Model O hand without joint encoders or tactile sensors. The passive adaptability of the underactuated hand greatly facilitates the self-identification process, because they naturally secure stable hand-object interactions during random exploration. Relying solely on an in-hand camera, our system can effectively self-identify the VLRs, even when some fingers are replaced with novel designs. In addition, we show in-hand manipulation applications of handwriting, marble maze playing, and cup stacking to demonstrate the effectiveness of the VLR in precise in-hand manipulation control.

     
    more » « less
  4. An object’s interior material properties, while invisible to the human eye, determine motion observed on its surface. We propose an approach that esti- mates heterogeneous material properties of an object directly from a monoc- ular video of its surface vibrations. Specifically, we estimate Young’s modulus and density throughout a 3D object with known geometry. Knowledge of how these values change across the object is useful for characterizing defects and simulating how the object will interact with different environments. Traditional non-destructive testing approaches, which generally estimate homogenized material properties or the presence of defects, are expensive and use specialized instruments. We propose an approach that leverages monocular video to (1) measure an object’s sub-pixel motion and decompose this motion into image-space modes, and (2) directly infer spatially-varying Young’s modulus and density values from the observed image-space modes. On both simulated and real videos, we demonstrate that our approach is able to image material properties simply by analyzing surface motion. In particular, our method allows us to identify unseen defects on a 2D drum head from real, high-speed video. 
    more » « less
  5. High power sources of electromagnetic energy often require complicated structures to support electromagnetic modes and shape electromagnetic fields to maximize the coupling of the field energy to intense relativistic electron beams. Geometric fidelity is critical to the accurate simulation of these High Power Electromagnetic (HPEM) sources. Here, we present a fast and geometrically flexible approach to calculate the solution to Maxwell’s equations in vector potential form under the Lorenz gauge. The scheme is an implicit, linear-time, high-order, A-stable method that is based on the method of lines transpose (MOLT). As presented, the method is fourth order in time and second order in space, but the A-stable formulation could be extended to both high order in time and space. An O(n) fast convolution is employed for space-integration. The main focus of this work is to develop an approach to impose perfectly electrically conducting (PEC) boundary conditions in MOLT by extending our past work on embedded boundary methods. As the method is A-stable, it does not suffer from small time step limitations that are found in explicit finite difference time domain methods when using either embedded boundary or cut-cell methods to capture geometry. This is a major advance for the simulation of HPEM devices. While there is no conceptual limitation to develop this in 3D, our initial work has centered on 2D. The extension to 3D requires validation that the proposed fixed point iteration will converge and is the subject of our follow-up work. The eventual goal is to combine this method with particle methods for the simulations of plasma. In the current work, the scheme is evaluated for EM wave propagation within an object that is bounded by PEC. The consistency and performance of the scheme are confirmed using the ping test and frequency mode analysis for rotated square cavities—a standard test in the HPEM community. We then demonstrate the diffraction Q value test and the use of this method for simulating an A6 magnetron. The ability to handle both PEC and open boundaries in a standard device test problem, such as the A6, gives confidence on the robustness of this new method.

     
    more » « less