Self-supervised depth estimation has recently demonstrated promising performance compared to the supervised methods on challenging indoor scenes. However, the majority of efforts mainly focus on exploiting photometric and geometric consistency via forward image warping and backward image warping, based on monocular videos or stereo image pairs. The influence of defocus blur to depth estimation is neglected, resulting in a limited performance for objects and scenes in out of focus. In this work, we propose the first framework for simultaneous depth estimation from a single image and image focal stacks using depth-from-defocus and depth-from-focus algorithms. The proposed network is able to learn optimal depth mapping from the information contained in the blur of a single image, generate a simulated image focal stack and all-in-focus image, and train a depth estimator from an image focal stack. In addition to the validation of our method on both synthetic NYUv2 dataset and real DSLR dataset, we also collect our own dataset using a DSLR camera and further verify on it. Experiments demonstrate that our system surpasses the state-of-the-art supervised depth estimation method over 4% in accuracy and achieves superb performance among the methods without direct supervision on the synthesized NYUv2 dataset, which has been rarely explored.
more »
« less
Focal Track: Depth and Accommodation with Oscillating Lens Deformation
The focal track sensor is a monocular and computationally efficient depth sensor that is based on defocus controlled by a liquid membrane lens. It synchronizes small lens oscillations with a photosensor to produce real-time depth maps by means of differential defocus, and it couples these oscillations with bigger lens deformations that adapt the defocus working range to track objects over large axial distances. To create the focal track sensor, we derive a texture-invariant family of equations that relate image derivatives to scene depth when a lens changes its focal length differentially. Based on these equations, we design a feed-forward sequence of computations that: robustly incorporates image derivatives at multiple scales; produces confidence maps along with depth; and can be trained endto- end to mitigate against noise, aberrations, and other non-idealities. Our prototype with 1-inch optics produces depth and confidence maps at 100 frames per second over an axial range of more than 75cm.
more »
« less
- PAR ID:
- 10092348
- Date Published:
- Journal Name:
- IEEE International Conference on Computer Vision
- Page Range / eLocation ID:
- 966 to 974
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We introduce Focal Split, a handheld, snapshot depth camera with fully onboard power and computing based on depth-from-differential-defocus (DfDD). Focal Split is passive, avoiding power consumption of light sources. Its achromatic optical system simultaneously forms two differentially defocused images of the scene, which can be independently captured using two photosensors in a snapshot. The data processing is based on the DfDD theory, which efficiently computes a depth and a confidence value for each pixel with only 500 floating point operations (FLOPs) per pixel from the camera measurements. We demonstrate a Focal Split prototype, which comprises a handheld custom camera system connected to a Raspberry Pi 5 for real-time data processing. The system consumes 4.9 W and is powered on a 5 V, 10,000 mAh battery. The prototype can measure objects with distances from 0.4 m to 1.2 m, outputting 480×360 sparse depth maps at 2.1 frames per second (FPS) using unoptimized Python scripts. Focal Split is DIY friendly. A comprehensive guide to building your own Focal Split depth camera, code, and additional data can be found at https://focal-split.qiguo.org.more » « less
-
Poor access to eye care is a major global challenge that could be ameliorated by low-cost, portable, and easy-to-use diagnostic technologies. Diffuser-based imaging has the potential to enable inexpensive, compact optical systems that can reconstruct a focused image of an object over a range of defocus errors. Here, we present a diffuser-based computational funduscope that reconstructs important clinical features of a model eye. Compared to existing diffuser-imager architectures, our system features an infinite-conjugate design by relaying the ocular lens onto the diffuser. This offers shift-invariance across a wide field-of-view (FOV) and an invariant magnification across an extended depth range. Experimentally, we demonstrate fundus image reconstruction over a 33°FOV and robustness to ±4D refractive error using a constant point-spread-function. Combined with diffuser-based wavefront sensing, this technology could enable combined ocular aberrometry and funduscopic screening through a single diffuser sensor.more » « less
-
Jumping spiders (Salticidae) rely on accurate depth perception for predation and navigation. They accomplish depth perception, despite their tiny brains, by using specialized optics. Each principal eye includes a multitiered retina that simultaneously receives multiple images with different amounts of defocus, and from these images, distance is decoded with relatively little computation. We introduce a compact depth sensor that is inspired by the jumping spider. It combines metalens optics, which modifies the phase of incident light at a subwavelength scale, with efficient computations to measure depth from image defocus. Instead of using a multitiered retina to transduce multiple simultaneous images, the sensor uses a metalens to split the light that passes through an aperture and concurrently form 2 differently defocused images at distinct regions of a single planar photosensor. We demonstrate a system that deploys a 3-mm-diameter metalens to measure depth over a 10-cm distance range, using fewer than 700 floating point operations per output pixel. Compared with previous passive depth sensors, our metalens depth sensor is compact, single-shot, and requires a small amount of computation. This integration of nanophotonics and efficient computation brings artificial depth sensing closer to being feasible on millimeter-scale, microwatts platforms such as microrobots and microsensor networks.more » « less
-
Conventional continuous-wave amplitude-modulated time-of-flight (CWAM ToF) cameras suffer from a fundamental trade-off between light throughput and depth of field (DoF): a larger lens aperture allows more light collection but suffers from significantly lower DoF. However, both high light throughput, which increases signal-to-noise ratio, and a wide DoF, which enlarges the system’s applicable depth range, are valuable for CWAM ToF applications. In this work, we propose EDoF-ToF, an algorithmic method to extend the DoF of large-aperture CWAM ToF cameras by using a neural network to deblur objects outside of the lens’s narrow focal region and thus produce an all-in-focus measurement. A key component of our work is the proposed large-aperture ToF training data simulator, which models the depth-dependent blurs and partial occlusions caused by such apertures. Contrary to conventional image deblurring where the blur model is typically linear, ToF depth maps are nonlinear functions of scene intensities, resulting in a nonlinear blur model that we also derive for our simulator. Unlike extended DoF for conventional photography where depth information needs to be encoded (or made depth-invariant) using additional hardware (phase masks, focal sweeping, etc.), ToF sensor measurements naturally encode depth information, allowing a completely software solution to extended DoF. We experimentally demonstrate EDoF-ToF increasing the DoF of a conventional ToF system by 3.6 ×, effectively achieving the DoF of a smaller lens aperture that allows 22.1 × less light. Ultimately, EDoF-ToF enables CWAM ToF cameras to enjoy the benefits of both high light throughput and a wide DoF.more » « less
An official website of the United States government

