We present a method that takes as input a single dual-pixel image, and simultaneously estimates the image's defocus map---the amount of defocus blur at each pixel---and recovers an all-in-focus image. Our method is inspired from recent works that leverage the dual-pixel sensors available in many consumer cameras to assist with autofocus, and use them for recovery of defocus maps or all-in-focus images. These prior works have solved the two recovery problems independently of each other, and often require large labeled datasets for supervised training. By contrast, we show that it is beneficial to treat these two closely-connected problems simultaneously. To this end, we set up an optimization problem that, by carefully modeling the optics of dual-pixel images, jointly solves both problems. We use data captured with a consumer smartphone camera to demonstrate that, after a one-time calibration step, our approach improves upon prior works for both defocus map estimation and blur removal, despite being entirely unsupervised.
more »
« less
Self-Supervised Single-Image Depth Estimation From Focus and Defocus Clues
Self-supervised depth estimation has recently demonstrated promising performance compared to the supervised methods on challenging indoor scenes. However, the majority of efforts mainly focus on exploiting photometric and geometric consistency via forward image warping and backward image warping, based on monocular videos or stereo image pairs. The influence of defocus blur to depth estimation is neglected, resulting in a limited performance for objects and scenes in out of focus. In this work, we propose the first framework for simultaneous depth estimation from a single image and image focal stacks using depth-from-defocus and depth-from-focus algorithms. The proposed network is able to learn optimal depth mapping from the information contained in the blur of a single image, generate a simulated image focal stack and all-in-focus image, and train a depth estimator from an image focal stack. In addition to the validation of our method on both synthetic NYUv2 dataset and real DSLR dataset, we also collect our own dataset using a DSLR camera and further verify on it. Experiments demonstrate that our system surpasses the state-of-the-art supervised depth estimation method over 4% in accuracy and achieves superb performance among the methods without direct supervision on the synthesized NYUv2 dataset, which has been rarely explored.
more »
« less
- PAR ID:
- 10426985
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- IEEE robotics automation letters
- ISSN:
- 2377-3766
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Extracting depth information from photon-limited, defocused images is challenging because depth from defocus (DfD) relies on accurate estimation of defocus blur, which is fundamentally sensitive to image noise. We present a novel approach to robustly measure object depths from photon-limited images along the defocused boundaries. It is based on a new image patch representation, Blurry-Edges, that explicitly stores and visualizes a rich set of low-level patch information, including boundaries, color, and smoothness. We develop a deep neural network architecture that predicts the Blurry-Edges representation from a pair of differently defocused images, from which depth can be analytically calculated using a novel DfD relation we derive. Our experiment shows that our method achieves the highest depth estimation accuracy on photon-limited images compared to a broad range of state-of-the-art DfD methods.more » « less
-
3D Gaussian Splatting (3DGS) techniques have recently enabled high-quality 3D scene reconstruction and real-time novel view synthesis. These approaches, however, are limited by the pinhole camera model and lack effective modeling of defocus effects. Departing from this, we introduce DOF-GS — a new 3DGS-based framework with a finite-aperture camera model and explicit, differentiable defocus rendering, enabling it to function as a post-capture control tool. By training with multi-view images with moderate defocus blur, DOF-GS learns inherent camera characteristics and reconstructs sharp details of the underlying scene, particularly, enabling rendering of varying DOF effects through on-demand aperture and focal distance control, post-capture and optimization. Additionally, our framework extracts circle-of-confusion cues during optimization to identify in-focus regions in input views, enhancing the reconstructed 3D scene details. Experimental results demonstrate that DOF-GS supports post-capture refocusing, adjustable defocus and high-quality all-in-focus rendering, from multi-view images with uncalibrated defocus blur.more » « less
-
One of the grand challenges in computer vision is to recover 3D poses and shapes of multiple human bodies with absolute scales from a single RGB image. The challenge stems from the inherent depth and scale ambiguity from a single view. The state of the art on 3D human pose and shape estimation mainly focuses on estimating the 3D joint locations relative to the root joint, defined as the pelvis joint. In this paper, a novel approach called Absolute-ROMP is proposed, which builds upon a one-stage multi-person 3D mesh predictor network, ROMP, to estimate multi-person 3D poses and shapes, but with absolute scales from a single RGB image. To achieve this, we introduce absolute root joint localization in the camera coordinate frame, which enables the estimation of 3D mesh coordinates of all persons in the image and their root joint locations normalized by the focal point. Moreover, a CNN and transformer hybrid network, called TransFocal, is proposed to predict the focal length of the image’s camera. This enables Absolute-ROMP to obtain absolute depth information of all joints in the camera coordinate frame, further improving the accuracy of our proposed method. The Absolute-ROMP is evaluated on the root joint localization and root-relative 3D pose estimation tasks on publicly available multi-person 3D pose datasets, and TransFocal is evaluated on a dataset created from the Pano360 dataset. Our proposed approach achieves state-of-the-art results on these tasks, outperforming existing methods or has competitive performance. Due to its real-time performance, our method is applicable to in-the-wild images and videos.more » « less
-
Visual odometry (VO) and single image depth estimation are critical for robot vision, 3D reconstruction, and camera pose estimation that can be applied to autonomous driving, map building, augmented reality and many other applications. Various supervised learning models have been proposed to train the VO or single image depth estimation framework for each targeted scene to improve the performance recently. However, little effort has been made to learn these separate tasks together without requiring the collection of a significant number of labels. This paper proposes a novel unsupervised learning approach to simultaneously perceive VO and single image depth estimation. In our framework, either of these tasks can benefit from each other through simultaneously learning these two tasks. We correlate these two tasks by enforcing depth consistency between VO and single image depth estimation. Based on the single image depth estimation, we can resolve the most common and challenging scaling issue of monocular VO. Meanwhile, through training from a sequence of images, VO can enhance the single image depth estimation accuracy. The effectiveness of our proposed method is demonstrated through extensive experiments compared with current state-of-the-art methods on the benchmark datasets.more » « less
An official website of the United States government

