skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: PhaseCam3D — Learning Phase Masks for Passive Single View Depth Estimation
There is an increasing need for passive 3D scanning in many applications that have stringent energy constraints. In this paper, we present an approach for single frame, single viewpoint, passive 3D imaging using a phase mask at the aperture plane of a camera. Our approach relies on an end-to-end optimization framework to jointly learn the optimal phase mask and the reconstruction algorithm that allows an accurate estimation of range image from captured data. Using our optimization framework, we design a new phase mask that performs significantly better than existing approaches. We build a prototype by inserting a phase mask fabricated using photolithography into the aperture plane of a conventional camera and show compelling performance in 3D imaging.  more » « less
Award ID(s):
1618823 1652569
PAR ID:
10106206
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
IEEE International Conference on Computational Photography
Page Range / eLocation ID:
1 to 12
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Passive, compact, single-shot 3D sensing is useful in many application areas such as microscopy, medical imaging, surgical navigation, and autonomous driving where form factor, time, and power constraints can exist. Obtaining RGB-D scene information over a short imaging distance, in an ultra-compact form factor, and in a passive, snapshot manner is challenging. Dual-pixel (DP) sensors are a potential solution to achieve the same. DP sensors collect light rays from two different halves of the lens in two interleaved pixel arrays, thus capturing two slightly different views of the scene, like a stereo camera system. However, imaging with a DP sensor implies that the defocus blur size is directly proportional to the disparity seen between the views. This creates a trade-off between disparity estimation vs. deblurring accuracy. To improve this trade-off effect, we propose CADS (Coded Aperture Dual-Pixel Sensing), in which we use a coded aperture in the imaging lens along with a DP sensor. In our approach, we jointly learn an optimal coded pattern and the reconstruction algorithm in an end-to-end optimization setting. Our resulting CADS imaging system demonstrates improvement of >1.5dB PSNR in all-in-focus (AIF) estimates and 5-6% in depth estimation quality over naive DP sensing for a wide range of aperture settings. Furthermore, we build the proposed CADS prototypes for DSLR photography settings and in an endoscope and a dermoscope form factor. Our novel coded dual-pixel sensing approach demonstrates accurate RGB-D reconstruction results in simulations and real-world experiments in a passive, snapshot, and compact manner. 
    more » « less
  2. Conventional stereo suffers from a fundamental trade-off between imaging volume and signal-to-noise ratio (SNR) – due to the conflicting impact of aperture size on both these variables. Inspired by the extended depth of field cameras, we propose a novel end-to-end learning-based technique to overcome this limitation, by introducing a phase mask at the aperture plane of the cameras in a stereo imaging system. The phase mask creates a depth-dependent yet numerically invertible point spread function, allowing us to recover sharp image texture and stereo correspondence over a significantly extended depth of field (EDOF) than conventional stereo. The phase mask pattern, the EDOF image reconstruction, and the stereo disparity estimation are all trained together using an end-to-end learned deep neural network. We perform theoretical analysis and characterization of the proposed approach and show a 6× increase in volume that can be imaged in simulation. We also build an experimental prototype and validate the approach using real-world results acquired using this prototype system. 
    more » « less
  3. Lensless cameras are ultra-thin imaging systems that replace the lens with a thin passive optical mask and computation. Passive mask-based lensless cameras encode depth information in their measurements for a certain depth range. Early works have shown that this encoded depth can be used to perform 3D reconstruction of close-range scenes. However, these approaches for 3D reconstructions are typically optimization based and require strong hand-crafted priors and hundreds of iterations to reconstruct. Moreover, the reconstructions suffer from low resolution, noise, and artifacts. In this work, we proposeFlatNet3D—a feed-forward deep network that can estimate both depth and intensity from a single lensless capture. FlatNet3D is an end-to-end trainable deep network that directly reconstructs depth and intensity from a lensless measurement using an efficient physics-based 3D mapping stage and a fully convolutional network. Our algorithm is fast and produces high-quality results, which we validate using both simulated and real scenes captured using PhlatCam. 
    more » « less
  4. Abstract In this paper, we present a novel design framework of synthetic radial aperture focusing for three-dimensional (3D) transrectal ultrasound imaging (TRUS-rSAF), in which multiple transmittance/reception events at different scanning angles are synthesized to reconstruct a radial plane in the target volume, securing high spatial resolution and texture uniformity. A theory-based design approach has not been available to push the envelope of the 3D rSAF technique. Herein, a closed-form analytical description of the TRUS-rSAF method is presented for the first time, effectively delineating spatial resolution and grating lobe positions in the radial dimension of a TRUS transducer. We demonstrate a solid optimization workflow based on the theoretical foundation to improve its spatiotemporal resolution, grating lobe artifacts, and signal-to-noise ratio. A specific design criterion was considered to outperform a clinical 3D TRUS imaging as a reference (TRUS-REF), where each radial plane is reconstructed with a single transmittance/reception event using a motorized actuator. The optimized TRUS-rSAF method significantly enhanced spatial resolution up to 50% over the TRUS-REF method while providing clinically effective temporal resolution (2–8 volume/sec) with negligible grating lobe artifacts. The results indicate that the proposed design approach would enable a novel TRUS imaging solution in clinics. 
    more » « less
  5. null (Ed.)
    Lensless imaging has emerged as a potential solution towards realizing ultra-miniature cameras by eschewing the bulky lens in a traditional camera. Without a focusing lens, the lensless cameras rely on computational algorithms to recover the scenes from multiplexed measurements. However, the current iterative-optimization-based reconstruction algorithms produce noisier and perceptually poorer images. In this work, we propose a non-iterative deep learning-based reconstruction approach that results in orders of magnitude improvement in image quality for lensless reconstructions. Our approach, called FlatNet, lays down a framework for reconstructing high-quality photorealistic images from mask-based lensless cameras, where the camera's forward model formulation is known. FlatNet consists of two stages: (1) an inversion stage that maps the measurement into a space of intermediate reconstruction by learning parameters within the forward model formulation, and (2) a perceptual enhancement stage that improves the perceptual quality of this intermediate reconstruction. These stages are trained together in an end-to-end manner. We show high-quality reconstructions by performing extensive experiments on real and challenging scenes using two different types of lensless prototypes: one which uses a separable forward model and another, which uses a more general non-separable cropped-convolution model. Our end-to-end approach is fast, produces photorealistic reconstructions, and is easy to adopt for other mask-based lensless cameras. 
    more » « less