skip to main content

This content will become publicly available on June 1, 2023

Title: Visual Vibration Tomography: Estimating Interior Material Properties from Monocular Video
An object’s interior material properties, while invisible to the human eye, determine motion observed on its surface. We propose an approach that esti- mates heterogeneous material properties of an object directly from a monoc- ular video of its surface vibrations. Specifically, we estimate Young’s modulus and density throughout a 3D object with known geometry. Knowledge of how these values change across the object is useful for characterizing defects and simulating how the object will interact with different environments. Traditional non-destructive testing approaches, which generally estimate homogenized material properties or the presence of defects, are expensive and use specialized instruments. We propose an approach that leverages monocular video to (1) measure an object’s sub-pixel motion and decompose this motion into image-space modes, and (2) directly infer spatially-varying Young’s modulus and density values from the observed image-space modes. On both simulated and real videos, we demonstrate that our approach is able to image material properties simply by analyzing surface motion. In particular, our method allows us to identify unseen defects on a 2D drum head from real, high-speed video.
; ; ;
Award ID(s):
1835677 1835648
Publication Date:
Journal Name:
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Page Range or eLocation-ID:
16210 to 16219
Sponsoring Org:
National Science Foundation
More Like this
  1. Disentangling the sources of visual motion in a dynamic scene during self-movement or ego motion is important for autonomous navigation and tracking. In the dynamic image segments of a video frame containing independently moving objects, optic flow relative to the next frame is the sum of the motion fields generated due to camera and object motion. The traditional ego-motion estimation methods assume the scene to be static, and the recent deep learning-based methods do not separate pixel velocities into object- and ego-motion components. We propose a learning-based approach to predict both ego-motion parameters and object-motion field (OMF) from image sequences using a convolutional autoencoder while being robust to variations due to the unconstrained scene depth. This is achieved by: 1) training with continuous ego-motion constraints that allow solving for ego-motion parameters independently of depth and 2) learning a sparsely activated overcomplete ego-motion field (EMF) basis set, which eliminates the irrelevant components in both static and dynamic segments for the task of ego-motion estimation. In order to learn the EMF basis set, we propose a new differentiable sparsity penalty function that approximates the number of nonzero activations in the bottleneck layer of the autoencoder and enforces sparsity more effectively than L1-more »and L2-norm-based penalties. Unlike the existing direct ego-motion estimation methods, the predicted global EMF can be used to extract OMF directly by comparing it against the optic flow. Compared with the state-of-the-art baselines, the proposed model performs favorably on pixelwise object- and ego-motion estimation tasks when evaluated on real and synthetic data sets of dynamic scenes.« less
  2. Abstract

    Arctic icebergs, unconstrained sea ice floes, oil slicks, mangrove drifters, lost cargo containers, and other flotsam are known to move at 2%–4% of the prevailing wind velocity relative to the water, despite vast differences in the material properties, shapes, and sizes of objects. Here, we revisit the roles of density, aspect ratio, and skin and form drag in determining how an object is driven by winds and water currents. Idealized theoretical considerations show that although substantial differences exist for end members of the parameter space (either very thin or thick and very light or dense objects), most realistic cases of floating objects drift at approximately 3% of the free-stream wind velocity (measured outside an object’s surface boundary layer) relative to the water. This relationship, known as a long-standing rule of thumb for the drift of various types of floating objects, arises from the square root of the ratio of the density of air to that of water. We support our theoretical findings with flume experiments using floating objects with a range of densities and shapes.

  3. Tracking the 6D pose of objects in video sequences is important for robot manipulation. This task, however, in- troduces multiple challenges: (i) robot manipulation involves significant occlusions; (ii) data and annotations are troublesome and difficult to collect for 6D poses, which complicates machine learning solutions, and (iii) incremental error drift often accu- mulates in long term tracking to necessitate re-initialization of the object’s pose. This work proposes a data-driven opti- mization approach for long-term, 6D pose tracking. It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object’s model. The key contribution in this context is a novel neural network architecture, which appropriately disentangles the feature encoding to help reduce domain shift, and an effective 3D orientation representation via Lie Algebra. Consequently, even when the network is trained only with synthetic data can work effectively over real images. Comprehensive experiments over benchmarks - existing ones as well as a new dataset with significant occlusions related to object manipulation - show that the proposed approach achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images. The approach is also themore »most computationally efficient among the alternatives and achieves a tracking frequency of 90.9Hz.« less
  4. Telecystoscopy can lower the barrier to access critical urologic diagnostics for patients around the world. A major challenge for robotic control of flexible cystoscopes and intuitive teleoperation is the pose estimation of the scope tip. We propose a novel real-time camera localization method using video recordings from a prior cystoscopy and 3D bladder reconstruction to estimate cystoscope pose within the bladder during follow-up telecystoscopy. We map prior video frames into a low-dimensional space as a dictionary so that a new image can be likewise mapped to efficiently retrieve its nearest neighbor among the dictionary images. The cystoscope pose is then estimated by the correspondence among the new image, its nearest dictionary image, and the prior model from 3D reconstruction. We demonstrate performance of our methods using bladder phantoms with varying fidelity and a servo-controlled cystoscope to simulate the use case of bladder surveillance through telecystoscopy. The servo-controlled cystoscope with 3 degrees of freedom (angulation, roll, and insertion axes) was developed for collecting cystoscope videos from bladder phantoms. Cystoscope videos were acquired in a 2.5D bladder phantom (bladder-shape cross-section plus height) with a panorama of a urothelium attached to the inner surface. Scans of the 2.5D phantom were performed in separatemore »arc trajectories each of which is generated by actuation on the angulation with a fixed roll and insertion length. We further included variance in moving speed, imaging distance and existence of bladder tumors. Cystoscope videos were also acquired in a water-filled 3D silicone bladder phantom with hand-painted vasculature. Scans of the 3D phantom were performed in separate circle trajectories each of which is generated by actuation on the roll axis under a fixed angulation and insertion length. These videos were used to create 3D reconstructions, dictionary sets, and test data sets for evaluating the computational efficiency and accuracy of our proposed method in comparison with a method based on global Scale-Invariant Feature Transform (SIFT) features, named SIFT-only. Our method can retrieve the nearest dictionary image for 94–100% of test frames in under 55[Formula: see text]ms per image, whereas the SIFT-only method can only find the image match for 56–100% of test frames in 6000–40000[Formula: see text]ms per image depending on size of the dictionary set and richness of SIFT features in the images. Our method, with a speed of around 20 Hz for the retrieval stage, is a promising tool for real-time image-based scope localization in robotic cystoscopy when prior cystoscopy images are available.« less
  5. Abstract We acquired a unique ambient vibration dataset from Castleton Tower, a 120 m high bedrock monolith located near Moab, Utah, to resolve dynamic and material properties of the landform. We identified the first two resonant modes at 0.8 and 1.0 Hz, which consist of mutually perpendicular, linearly polarized horizontal ground motion at the top of the tower. Damping ratios for these modes were low at ∼1%. We successfully reproduced field data in 3D numerical eigenfrequency simulation implementing a Young’s modulus of 7 GPa, a value ∼30% lower than measured on core samples. Our analysis confirms that modal deformation at the first resonant frequencies closely resembles that of a cantilever beam. The outcome is that with basic estimates of geometry and material properties, the resonant frequencies of other freestanding rock monoliths can be estimated a priori. Such estimates are crucial to evaluate the response of rock towers to external vibration inputs.