Self-supervised depth estimation has recently demonstrated promising performance compared to the supervised methods on challenging indoor scenes. However, the majority of efforts mainly focus on exploiting photometric and geometric consistency via forward image warping and backward image warping, based on monocular videos or stereo image pairs. The influence of defocus blur to depth estimation is neglected, resulting in a limited performance for objects and scenes in out of focus. In this work, we propose the first framework for simultaneous depth estimation from a single image and image focal stacks using depth-from-defocus and depth-from-focus algorithms. The proposed network is able to learn optimal depth mapping from the information contained in the blur of a single image, generate a simulated image focal stack and all-in-focus image, and train a depth estimator from an image focal stack. In addition to the validation of our method on both synthetic NYUv2 dataset and real DSLR dataset, we also collect our own dataset using a DSLR camera and further verify on it. Experiments demonstrate that our system surpasses the state-of-the-art supervised depth estimation method over 4% in accuracy and achieves superb performance among the methods without direct supervision on the synthesized NYUv2 dataset, which has been rarely explored.
more »
« less
Defocus Map Estimation and Deblurring From a Single Dual-Pixel Image
We present a method that takes as input a single dual-pixel image, and simultaneously estimates the image's defocus map---the amount of defocus blur at each pixel---and recovers an all-in-focus image. Our method is inspired from recent works that leverage the dual-pixel sensors available in many consumer cameras to assist with autofocus, and use them for recovery of defocus maps or all-in-focus images. These prior works have solved the two recovery problems independently of each other, and often require large labeled datasets for supervised training. By contrast, we show that it is beneficial to treat these two closely-connected problems simultaneously. To this end, we set up an optimization problem that, by carefully modeling the optics of dual-pixel images, jointly solves both problems. We use data captured with a consumer smartphone camera to demonstrate that, after a one-time calibration step, our approach improves upon prior works for both defocus map estimation and blur removal, despite being entirely unsupervised.
more »
« less
- Award ID(s):
- 1730147
- PAR ID:
- 10317160
- Date Published:
- Journal Name:
- Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
The image processing task of the recovery of an image from a noisy or compromised image is an illposed inverse problem. To solve this problem, it is necessary to incorporate prior information about the smoothness, or the structure, of the solution, by incorporating regularization. Here, we consider linear blur operators with an efficiently-found singular value decomposition. Then, regularization is obtained by employing a truncated singular value expansion for image recovery. In this study, we focus on images for which the image blur operator is separable and can be represented by a Kronecker product such that the associated singular value decomposition is expressible in terms of the singular value decompositions of the separable components. The truncation index k can then be identified without forming the full Kronecker product of the two terms. This report investigates the problem of learning an optimal k using two methods. For one method to learn k we assume the knowledge of the true images, yielding a supervised learning algorithm based on the average relative error. The second method uses the method of generalized cross validation and does not require knowledge of the true images. The approach is implemented and demonstrated to be successful for Gaussian, Poisson and salt and pepper noise types across noise levels with signal to noise ratios as low as 10. This research contributes to the field by offering insights into the use of the supervised and unsupervised estimators for the truncation index, and demonstrates that the unsupervised algorithm is not only robust and computationally efficient, but is also comparable to the supervised method.more » « less
-
Passive, compact, single-shot 3D sensing is useful in many application areas such as microscopy, medical imaging, surgical navigation, and autonomous driving where form factor, time, and power constraints can exist. Obtaining RGB-D scene information over a short imaging distance, in an ultra-compact form factor, and in a passive, snapshot manner is challenging. Dual-pixel (DP) sensors are a potential solution to achieve the same. DP sensors collect light rays from two different halves of the lens in two interleaved pixel arrays, thus capturing two slightly different views of the scene, like a stereo camera system. However, imaging with a DP sensor implies that the defocus blur size is directly proportional to the disparity seen between the views. This creates a trade-off between disparity estimation vs. deblurring accuracy. To improve this trade-off effect, we propose CADS (Coded Aperture Dual-Pixel Sensing), in which we use a coded aperture in the imaging lens along with a DP sensor. In our approach, we jointly learn an optimal coded pattern and the reconstruction algorithm in an end-to-end optimization setting. Our resulting CADS imaging system demonstrates improvement of >1.5dB PSNR in all-in-focus (AIF) estimates and 5-6% in depth estimation quality over naive DP sensing for a wide range of aperture settings. Furthermore, we build the proposed CADS prototypes for DSLR photography settings and in an endoscope and a dermoscope form factor. Our novel coded dual-pixel sensing approach demonstrates accurate RGB-D reconstruction results in simulations and real-world experiments in a passive, snapshot, and compact manner.more » « less
-
Blur occurs naturally when the eye is focused at one distance and an object is presented at another distance. Computer-graphics engineers and vision scientists often wish to create display images that reproduce such depth-dependent blur, but their methods are incorrect for that purpose. They take into account the scene geometry, pupil size, and focal distances, but do not properly take into account the optical aberrations of the human eye. We developed a method that, by incorporating the viewer’s optics, yields displayed images that produce retinal images close to the ones that occur in natural viewing. We concentrated on the effects of defocus, chromatic aberration, astigmatism, and spherical aberration and evaluated their effectiveness by conducting experiments in which we attempted to drive the eye’s focusing response (accommodation) through the rendering of these aberrations. We found that accommodation is not driven at all by conventional rendering methods, but that it is driven surprisingly quickly and accurately by our method with defocus and chromatic aberration incorporated. We found some effect of astigmatism but none of spherical aberration. We discuss how the rendering approach can be used in vision science experiments and in the development of ophthalmic/optometric devices and augmented- and virtual-reality displays.more » « less
-
Jumping spiders (Salticidae) rely on accurate depth perception for predation and navigation. They accomplish depth perception, despite their tiny brains, by using specialized optics. Each principal eye includes a multitiered retina that simultaneously receives multiple images with different amounts of defocus, and from these images, distance is decoded with relatively little computation. We introduce a compact depth sensor that is inspired by the jumping spider. It combines metalens optics, which modifies the phase of incident light at a subwavelength scale, with efficient computations to measure depth from image defocus. Instead of using a multitiered retina to transduce multiple simultaneous images, the sensor uses a metalens to split the light that passes through an aperture and concurrently form 2 differently defocused images at distinct regions of a single planar photosensor. We demonstrate a system that deploys a 3-mm-diameter metalens to measure depth over a 10-cm distance range, using fewer than 700 floating point operations per output pixel. Compared with previous passive depth sensors, our metalens depth sensor is compact, single-shot, and requires a small amount of computation. This integration of nanophotonics and efficient computation brings artificial depth sensing closer to being feasible on millimeter-scale, microwatts platforms such as microrobots and microsensor networks.more » « less
An official website of the United States government

