skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Visual Vibration Tomography: Estimating Interior Material Properties from Monocular Video
An object’s interior material properties, while invisible to the human eye, determine motion observed on its surface. We propose an approach that esti- mates heterogeneous material properties of an object directly from a monoc- ular video of its surface vibrations. Specifically, we estimate Young’s modulus and density throughout a 3D object with known geometry. Knowledge of how these values change across the object is useful for characterizing defects and simulating how the object will interact with different environments. Traditional non-destructive testing approaches, which generally estimate homogenized material properties or the presence of defects, are expensive and use specialized instruments. We propose an approach that leverages monocular video to (1) measure an object’s sub-pixel motion and decompose this motion into image-space modes, and (2) directly infer spatially-varying Young’s modulus and density values from the observed image-space modes. On both simulated and real videos, we demonstrate that our approach is able to image material properties simply by analyzing surface motion. In particular, our method allows us to identify unseen defects on a 2D drum head from real, high-speed video.  more » « less
Award ID(s):
1835677 1835648
PAR ID:
10378315
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Page Range / eLocation ID:
16210 to 16219
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Tracking the 6D pose of objects in video sequences is important for robot manipulation. This task, however, in- troduces multiple challenges: (i) robot manipulation involves significant occlusions; (ii) data and annotations are troublesome and difficult to collect for 6D poses, which complicates machine learning solutions, and (iii) incremental error drift often accu- mulates in long term tracking to necessitate re-initialization of the object’s pose. This work proposes a data-driven opti- mization approach for long-term, 6D pose tracking. It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object’s model. The key contribution in this context is a novel neural network architecture, which appropriately disentangles the feature encoding to help reduce domain shift, and an effective 3D orientation representation via Lie Algebra. Consequently, even when the network is trained only with synthetic data can work effectively over real images. Comprehensive experiments over benchmarks - existing ones as well as a new dataset with significant occlusions related to object manipulation - show that the proposed approach achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images. The approach is also the most computationally efficient among the alternatives and achieves a tracking frequency of 90.9Hz. 
    more » « less
  2. null (Ed.)
    In this study, a novel ultrasonic non-destructive and non-invasive elastography method was introduced and demonstrated to evaluate the mechanical properties of fused deposition modeling 3D printed objects using two-dimensional dynamical elasticity mapping. Based on the recently investigated dynamic bulk modulus and effective density imaging technique, an angle-dependent dynamic shear modulus measurement was performed to extract the dynamic Young’s modulus distribution of the FDM structures. The elastographic image analysis demonstrated the presence of anisotropic dynamic shear modulus and dynamic Young’s modulus existing in the fused deposition modeling 3D printed objects. The non-destructive method also differentiated samples with high contrast property zones from that of low contrast property regions. The angle-dependent elasticity contrast behavior from the ultrasonic method was compared with conventional and static tensile tests characterization. A good correlation between the nondestructive technique and the tensile test measurements was observed. 
    more » « less
  3. null (Ed.)
    Convolutional Neural Network (CNN) based image segmentation has made great progress in recent years. However, video object segmentation remains a challenging task due to its high computational complexity. Most of the previous methods employ a two-stream CNN framework to handle spatial and motion features separately. In this paper, we propose an end-to-end encoder-decoder style 3D CNN to aggregate spatial and temporal information simultaneously for video object segmentation. To efficiently process video, we propose 3D separable convolution for the pyramid pooling module and decoder, which dramatically reduces the number of operations while maintaining the performance. Moreover, we also extend our framework to video action segmentation by adding an extra classifier to predict the action label for actors in videos. Extensive experiments on several video datasets demonstrate the superior performance of the proposed approach for action and object segmentation compared to the state-of-the-art. 
    more » « less
  4. Separating an image into meaningful underlying components is a crucial first step for both editing and understanding images. We present a method capable of selecting the regions of a photograph exhibiting the same material as an artist-chosen area. Our proposed approach is robust to shading, specular highlights, and cast shadows, enabling selection in real images. As we do not rely on semantic segmentation (different woods or metal should not be selected together), we formulate the problem as a similarity-based grouping problem based on a user-provided image location. In particular, we propose to leverage the unsupervised DINO [Caron et al. 2021] features coupled with a proposed Cross-Similarity Feature Weighting module and an MLP head to extract material similarities in an image. We train our model on a new synthetic image dataset, that we release. We show that our method generalizes well to real-world images. We carefully analyze our model's behavior on varying material properties and lighting. Additionally, we evaluate it against a hand-annotated benchmark of 50 real photographs. We further demonstrate our model on a set of applications, including material editing, in-video selection, and retrieval of object photographs with similar materials. 
    more » « less
  5. Disentangling the sources of visual motion in a dynamic scene during self-movement or ego motion is important for autonomous navigation and tracking. In the dynamic image segments of a video frame containing independently moving objects, optic flow relative to the next frame is the sum of the motion fields generated due to camera and object motion. The traditional ego-motion estimation methods assume the scene to be static, and the recent deep learning-based methods do not separate pixel velocities into object- and ego-motion components. We propose a learning-based approach to predict both ego-motion parameters and object-motion field (OMF) from image sequences using a convolutional autoencoder while being robust to variations due to the unconstrained scene depth. This is achieved by: 1) training with continuous ego-motion constraints that allow solving for ego-motion parameters independently of depth and 2) learning a sparsely activated overcomplete ego-motion field (EMF) basis set, which eliminates the irrelevant components in both static and dynamic segments for the task of ego-motion estimation. In order to learn the EMF basis set, we propose a new differentiable sparsity penalty function that approximates the number of nonzero activations in the bottleneck layer of the autoencoder and enforces sparsity more effectively than L1- and L2-norm-based penalties. Unlike the existing direct ego-motion estimation methods, the predicted global EMF can be used to extract OMF directly by comparing it against the optic flow. Compared with the state-of-the-art baselines, the proposed model performs favorably on pixelwise object- and ego-motion estimation tasks when evaluated on real and synthetic data sets of dynamic scenes. 
    more » « less