- Award ID(s):
- 2038897
- PAR ID:
- 10312124
- Date Published:
- Journal Name:
- Advances in neural information processing systems
- ISSN:
- 1049-5258
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Neural networks can represent and accurately reconstruct radiance fields for static 3D scenes (e.g., NeRF). Several works extend these to dynamic scenes captured with monocular video, with promising performance. However, the monocular setting is known to be an under-constrained problem, and so methods rely on data-driven priors for reconstructing dynamic content. We replace these priors with measurements from a time-of-flight (ToF) camera, and introduce a neural representation based on an image formation model for continuous-wave ToF cameras. Instead of working with processed depth maps, we model the raw ToF sensor measurements to improve reconstruction quality and avoid issues with low reflectance regions, multi-path interference, and a sensor's limited unambiguous depth range. We show that this approach improves robustness of dynamic scene reconstruction to erroneous calibration and large motions, and discuss the benefits and limitations of integrating RGB+ToF sensors now available on modern smartphones.more » « less
-
Recent advances in Neural Radiance Field (NeRF)-based methods have enabled high-fidelity novel view synthesis for video with dynamic elements. However, these methods often require expensive hardware, take days to process a second-long video and do not scale well to longer videos. We create an end-to-end pipeline for creating dynamic 3D video from a monocular video that can be run on consumer hardware in minutes per second of footage, not days. Our pipeline handles the estimation of the camera parameters, depth maps, 3D reconstruction of dynamic foreground and static background elements, and the rendering of the 3D video on a computer or VR headset. We use a state-of-the-art visual transformer model to estimate depth maps which we use to scale COLMAP poses and enable RGB-D fusion with estimated depth data. In our preliminary experiments, we rendered the output in a VR headset and visually compared the method against ground-truth datasets and state-of-the-art NeRF-based methods.more » « less
-
We introduce Doppler time-of-flight (D-ToF) rendering, an extension of ToF rendering for dynamic scenes, with applications in simulating D-ToF cameras. D-ToF cameras use high-frequency modulation of illumination and exposure, and measure the Doppler frequency shift to compute the radial velocity of dynamic objects. The time-varying scene geometry and high-frequency modulation functions used in such cameras make it challenging to accurately and efficiently simulate their measurements with existing ToF rendering algorithms. We overcome these challenges in a twofold manner: To achieve accuracy, we derive path integral expressions for D-ToF measurements under global illumination and form unbiased Monte Carlo estimates of these integrals. To achieve efficiency, we develop a tailored time-path sampling technique that combines antithetic time sampling with correlated path sampling. We show experimentally that our sampling technique achieves up to two orders of magnitude lower variance compared to naive time-path sampling. We provide an open-source simulator that serves as a digital twin for D-ToF imaging systems, allowing imaging researchers, for the first time, to investigate the impact of modulation functions, material properties, and global illumination on D-ToF imaging performance.
-
Lensless cameras are ultra-thin imaging systems that replace the lens with a thin passive optical mask and computation. Passive mask-based lensless cameras encode depth information in their measurements for a certain depth range. Early works have shown that this encoded depth can be used to perform 3D reconstruction of close-range scenes. However, these approaches for 3D reconstructions are typically optimization based and require strong hand-crafted priors and hundreds of iterations to reconstruct. Moreover, the reconstructions suffer from low resolution, noise, and artifacts. In this work, we propose
FlatNet3D —a feed-forward deep network that can estimate both depth and intensity from a single lensless capture. FlatNet3D is an end-to-end trainable deep network that directly reconstructs depth and intensity from a lensless measurement using an efficient physics-based 3D mapping stage and a fully convolutional network. Our algorithm is fast and produces high-quality results, which we validate using both simulated and real scenes captured using PhlatCam. -
Scene reconstruction using Monodepth2 (Monocular Depth Inference) which provides depth maps from a single RGB camera, the outputs are filled with noise and inconsistencies. Instance segmentation using a Mask R-CNN (Region Based Convolution Neural Networks) deep model can provide object segmentation results in 2D but lacks 3D information. In this paper we propose to integrate the results of Instance segmentation via Mask R-CNN’s, CAD model Car Shape Alignment, and depth from Monodepth2 together with classical dynamic vision techniques to create a High-level Semantic Model with separability, robustness, consistency and saliency. The model is useful for both virtualized rendering, semantic augmented reality and automatic driving. Experimental results are provided to validate the approach.more » « less