In this paper, we present OpenWaters, a real-time open-source underwater simulation kit for generating photorealistic underwater scenes. OpenWaters supports creation of massive amount of underwater images by emulating diverse real-world conditions. It allows for fine controls over every variable in a simulation instance, including geometry, rendering parameters like ray-traced water caustics, scattering, and ground-truth labels. Using underwater depth (distance between camera and object) estimation as the use-case, we showcase and validate the capabilities of OpenWaters to model underwater scenes that are used to train a deep neural network for depth estimation. Our experimental evaluation demonstrates depth estimation using synthetic underwater images with high accuracy, and feasibility of transfer-learning of features from synthetic to real-world images.
more »
« less
This content will become publicly available on May 1, 2026
AquaFuse: Waterbody Fusion for Physics-Guided View Synthesis of Underwater Scenes
In this letter, we introduce the idea of AquaFuse, a physics-based method for synthesizing waterbody properties in underwater imagery. We formulate a closed-form solution for waterbody fusion that facilitates realistic data augmentation and geometrically consistent underwater scene rendering. AquaFuse leverages the physical characteristics of light propagation underwater to synthesize the waterbody from one scene to the object contents of another. Unlike data-driven style transfer methods, AquaFuse preserves the depth consistency and object geometry in an input scene. We validate this unique feature by comprehensive experiments over diverse sets of underwater scenes. We find that the AquaFused images preserve over 94% depth consistency and 90–95% structural similarity of the input scenes. We also demonstrate that it generates accurate 3D view synthesis by preserving object geometry while adapting to the inherent waterbody fusion process. AquaFuse opens up a new research direction in data augmentation by geometry-preserving style transfer for underwater imaging and robot vision.
more »
« less
- PAR ID:
- 10614034
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- IEEE Robotics and Automation Letters
- Volume:
- 10
- Issue:
- 5
- ISSN:
- 2377-3774
- Page Range / eLocation ID:
- 4316 to 4323
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Underwater image restoration aims to recover color, contrast, and appearance in underwater scenes, crucial for fields like marine ecology and archaeology. While pixel-domain diffusion methods work for simple scenes, they are computationally heavy and produce artifacts in complex, depth-varying scenes. We present a single-step latent diffusion method, SLURPP (Single-step Latent Underwater Restoration with Pretrained Priors), that overcomes these limitations by combining a novel network architecture with an accurate synthetic data generation pipeline. SLURPP combines pretrained latent diffusion models - which encode strong priors on the geometry and depth of scenes with an explicit scene decomposition, which allows one to model and account for the effects of light attenuation and backscattering. To train SLURPP, we design a physics-based underwater image synthesis pipeline that applies varied and realistic underwater degradation effects to existing terrestrial image datasets. We evaluate our method extensively on both synthetic and real-world benchmarks and demonstrate state-of-the-art performance.more » « less
-
Differentiable 3D-Gaussian splatting (GS) is emerging as a prominent technique in computer vision and graphics for reconstructing 3D scenes. GS represents a scene as a set of 3D Gaussians with varying opacities and employs a computationally efficient splatting operation along with analytical derivatives to compute the 3D Gaussian parameters given scene images captured from various viewpoints. Unfortunately, capturing surround view (360° viewpoint) images is impossible or impractical in many real-world imaging scenarios, including underwater imaging, rooms inside a building, and autonomous navigation. In these restricted baseline imaging scenarios, the GS algorithm suffers from a well-known ‘missing cone’ problem, which results in poor reconstruction along the depth axis. In this paper, we demonstrate that using transient data (from sonars) allows us to address the missing cone problem by sampling high-frequency data along the depth axis. We extend the Gaussian splatting algorithms for two commonly used sonars and propose fusion algorithms that simultaneously utilize RGB camera data and sonar data. Through simulations, emulations, and hardware experiments across various imaging scenarios, we show that the proposed fusion algorithms lead to significantly better novel view synthesis (5 dB improvement in PSNR) and 3D geometry reconstruction (60% lower Chamfer distance).more » « less
-
We present WonderWorld, a novel framework for interactive 3D scene generation that enables users to interactively specify scene contents and layout and see the created scenes in low latency. The major challenge lies in achieving fast generation of 3D scenes. Existing scene generation approaches fall short of speed as they often require (1) progressively generating many views and depth maps, and (2) time-consuming optimization of the scene representations. Our approach does not need multiple views, and it leverages a geometry-based initialization that significantly reduces optimization time. Another challenge is generating coherent geometry that allows all scenes to be connected. We introduce the guided depth diffusion that allows partial conditioning of depth estimation. WonderWorld generates connected and diverse 3D scenes in less than 10 seconds on a single A6000 GPU, enabling real-time user interaction and exploration. Our interactive demo, full code, data, and software can be found at https://kovenyu.com/WonderWorld/more » « less
-
A major challenge in monocular 3D object detection is the limited diversity and quantity of objects in real datasets. While augmenting real scenes with virtual objects holds promise to improve both the diversity and quantity of the objects, it remains elusive due to the lack of an effective 3D object insertion method in complex real captured scenes. In this work, we study augmenting complex real indoor scenes with virtual objects for monocular 3D object detection. The main challenge is to automatically identify plausible physical properties for virtual assets (e.g., locations, appearances, sizes, etc.) in cluttered real scenes. To address this challenge, we propose a physically plausible indoor 3D object insertion approach to automatically copy virtual objects and paste them into real scenes. The resulting objects in scenes have 3D bounding boxes with plausible physical locations and appearances. In particular, our method first identifies physically feasible locations and poses for the inserted objects to prevent collisions with the existing room layout. Subsequently, it estimates spatially-varying illumination for the insertion location, enabling the immersive blending of the virtual objects into the original scene with plausible appearances and cast shadows. We show that our augmentation method significantly improves existing monocular 3D object models and achieves state-of-the-art performance. For the first time, we demonstrate that a physically plausible 3D object insertion, serving as a generative data augmentation technique, can lead to significant improvements for discriminative downstream tasks such as monocular 3D object detection.more » « less
An official website of the United States government
