skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Single-Step Latent Diffusion for Underwater Image Restoration
Underwater image restoration aims to recover color, contrast, and appearance in underwater scenes, crucial for fields like marine ecology and archaeology. While pixel-domain diffusion methods work for simple scenes, they are computationally heavy and produce artifacts in complex, depth-varying scenes. We present a single-step latent diffusion method, SLURPP (Single-step Latent Underwater Restoration with Pretrained Priors), that overcomes these limitations by combining a novel network architecture with an accurate synthetic data generation pipeline. SLURPP combines pretrained latent diffusion models - which encode strong priors on the geometry and depth of scenes with an explicit scene decomposition, which allows one to model and account for the effects of light attenuation and backscattering. To train SLURPP, we design a physics-based underwater image synthesis pipeline that applies varied and realistic underwater degradation effects to existing terrestrial image datasets. We evaluate our method extensively on both synthetic and real-world benchmarks and demonstrate state-of-the-art performance.  more » « less
Award ID(s):
2330416
PAR ID:
10653548
Author(s) / Creator(s):
 ;  ;  ;  ;  ;  ;  
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE Transactions on Pattern Analysis and Machine Intelligence
ISSN:
0162-8828
Page Range / eLocation ID:
1 to 11
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Self-supervised depth estimation has recently demonstrated promising performance compared to the supervised methods on challenging indoor scenes. However, the majority of efforts mainly focus on exploiting photometric and geometric consistency via forward image warping and backward image warping, based on monocular videos or stereo image pairs. The influence of defocus blur to depth estimation is neglected, resulting in a limited performance for objects and scenes in out of focus. In this work, we propose the first framework for simultaneous depth estimation from a single image and image focal stacks using depth-from-defocus and depth-from-focus algorithms. The proposed network is able to learn optimal depth mapping from the information contained in the blur of a single image, generate a simulated image focal stack and all-in-focus image, and train a depth estimator from an image focal stack. In addition to the validation of our method on both synthetic NYUv2 dataset and real DSLR dataset, we also collect our own dataset using a DSLR camera and further verify on it. Experiments demonstrate that our system surpasses the state-of-the-art supervised depth estimation method over 4% in accuracy and achieves superb performance among the methods without direct supervision on the synthesized NYUv2 dataset, which has been rarely explored. 
    more » « less
  2. In this paper, we present OpenWaters, a real-time open-source underwater simulation kit for generating photorealistic underwater scenes. OpenWaters supports creation of massive amount of underwater images by emulating diverse real-world conditions. It allows for fine controls over every variable in a simulation instance, including geometry, rendering parameters like ray-traced water caustics, scattering, and ground-truth labels. Using underwater depth (distance between camera and object) estimation as the use-case, we showcase and validate the capabilities of OpenWaters to model underwater scenes that are used to train a deep neural network for depth estimation. Our experimental evaluation demonstrates depth estimation using synthetic underwater images with high accuracy, and feasibility of transfer-learning of features from synthetic to real-world images. 
    more » « less
  3. The inference stage of diffusion models involves running a reverse-time diffusion stochastic differential equation, transforming samples from a Gaussian latent distribution into samples from a target distribution on a low-dimensional manifold. The intermediate values can be interpreted as noisy images, with the amount of noise determined by the forward diffusion process noise schedule. Boomerang is an approach for local sampling of image manifolds, which involves adding noise to an input image, moving it closer to the latent space, and mapping it back to the image manifold through a partial reverse diffusion process. Boomerang can be used with any pretrained diffusion model without adjustments to the reverse diffusion process, and we present three applications: constructing privacy-preserving datasets with controllable anonymity, increasing generalization performance with Boomerang for data augmentation, and enhancing resolution with a perceptual image enhancement framework. 
    more » « less
  4. We introduce a method for novel view synthesis given only a single wide-baseline stereo image pair. In this challenging regime, 3D scene points are regularly observed only once, requiring prior-based reconstruction of scene geometry and appearance. We find that existing approaches to novel view synthesis from sparse observations fail due to recovering incorrect 3D geometry and due to the high cost of differentiable rendering that precludes their scaling to large-scale training. We take a step towards resolving these shortcomings by formulating a multi-view transformer encoder, proposing an efficient, image-space epipolar line sampling scheme to assemble image features for a target ray, and a lightweight cross-attention-based renderer. Our contributions enable training of our method on a large-scale real-world dataset of indoor and outdoor scenes. We demonstrate that our method learns powerful multi-view geometry priors while reducing both rendering time and memory footprint. We conduct extensive comparisons on held-out test scenes across two real-world datasets, significantly outperforming prior work on novel view synthesis from sparse image observations and achieving multi-view-consistent novel view synthesis. 
    more » « less
  5. We propose a novel method for combining synthetic and real images when training networks to determine geomet- ric information from a single image. We suggest a method for mapping both image types into a single, shared domain. This is connected to a primary network for end-to-end train- ing. Ideally, this results in images from two domains that present shared information to the primary network. Our experiments demonstrate significant improvements over the state-of-the-art in two important domains, surface normal estimation of human faces and monocular depth estimation for outdoor scenes, both in an unsupervised setting. 
    more » « less