Inferring the 3D structure underlying a set of multi-view images typically requires solving two co-dependent tasks -- accurate 3D reconstruction requires precise camera poses, and predicting camera poses relies on (implicitly or explicitly) modeling the underlying 3D. The classical framework of analysis by synthesis casts this inference as a joint optimization seeking to explain the observed pixels, and recent instantiations learn expressive 3D representations (e.g., Neural Fields) with gradient-descent-based pose refinement of initial pose estimates. However, given a sparse set of observed views, the observations may not provide sufficient direct evidence to obtain complete and accurate 3D. Moreover, large errors in pose estimation may not be easily corrected and can further degrade the inferred 3D. To allow robust 3D reconstruction and pose estimation in this challenging setup, we propose SparseAGS, a method that adapts this analysis-by-synthesis approach by: a) including novel-view-synthesis-based generative priors in conjunction with photometric objectives to improve the quality of the inferred 3D, and b) explicitly reasoning about outliers and using a discrete search with a continuous optimization-based strategy to correct them. We validate our framework across real-world and synthetic datasets in combination with several off-the-shelf pose estimation systems as initialization. We find that it significantly improves the base systems' pose accuracy while yielding high-quality 3D reconstructions that outperform the results from current multi-view reconstruction baselines.
more »
« less
This content will become publicly available on July 13, 2025
AONeuS: A Neural Rendering Framework for Acoustic-Optical Sensor Fusion
Underwater perception and 3D surface reconstruction are challenging problems with broad applications in construction, security, marine archaeology, and environmental monitoring. Treacherous operating conditions, fragile surroundings, and limited navigation control often dictate that submersibles restrict their range of motion and, thus, the baseline over which they can capture measurements. In the context of 3D scene reconstruction, it is well-known that smaller baselines make reconstruction more challenging. Our work develops a physics-based multimodal acoustic-optical neural surface reconstruction framework (AONeuS) capable of effectively integrating high-resolution RGB measurements with low-resolution depth-resolved imaging sonar measurements. By fusing these complementary modalities, our framework can reconstruct accurate high-resolution 3D surfaces from measurements captured over heavily-restricted baselines. Through extensive simulations and in-lab experiments, we demonstrate that AONeuS dramatically outperforms recent RGB-only and sonar-only inverse-differentiable-rendering--based surface reconstruction methods.
more »
« less
- Award ID(s):
- 2326904
- PAR ID:
- 10561691
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9798400705250
- Page Range / eLocation ID:
- 1 to 12
- Format(s):
- Medium: X
- Location:
- Denver CO USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Single image 3D face reconstruction with accurate geometric details is a critical and challenging task due to the similar appearance on the face surface and fine details in organs. In this work, we introduce a self-supervised 3D face reconstruction approach from a single image that can recover detailed textures under different camera settings. The proposed network learns high-quality disparity maps from stereo face images during the training stage, while just a single face image is required to generate the 3D model in real applications. To recover fine details of each organ and facial surface, the framework introduces facial landmark spatial consistency to constrain the face recovering learning process in local point level and segmentation scheme on facial organs to constrain the correspondences at the organ level. The face shape and textures will further be refined by establishing holistic constraints based on the varying light illumination and shading information. The proposed learning framework can recover more accurate 3D facial details both quantitatively and qualitatively compared with state-of-the-art 3DMM and geometry-based reconstruction algorithms based on a single image.more » « less
-
Sonar echoes can provide more than only range information, but recording the full sonar echo is challenging in resource constrained systems. This paper introduces an approach for reconstructing under-sampled sonar echo signals in environments that are not cluttered using Compressive Sensing. This technique requires sampling only around 20% of the total samples in order to achieve good reconstruction results. An experimental validation of the approach is presented.more » « less
-
Differentiable 3D-Gaussian splatting (GS) is emerging as a prominent technique in computer vision and graphics for reconstructing 3D scenes. GS represents a scene as a set of 3D Gaussians with varying opacities and employs a computationally efficient splatting operation along with analytical derivatives to compute the 3D Gaussian parameters given scene images captured from various viewpoints. Unfortunately, capturing surround view (360° viewpoint) images is impossible or impractical in many real-world imaging scenarios, including underwater imaging, rooms inside a building, and autonomous navigation. In these restricted baseline imaging scenarios, the GS algorithm suffers from a well-known ‘missing cone’ problem, which results in poor reconstruction along the depth axis. In this paper, we demonstrate that using transient data (from sonars) allows us to address the missing cone problem by sampling high-frequency data along the depth axis. We extend the Gaussian splatting algorithms for two commonly used sonars and propose fusion algorithms that simultaneously utilize RGB camera data and sonar data. Through simulations, emulations, and hardware experiments across various imaging scenarios, we show that the proposed fusion algorithms lead to significantly better novel view synthesis (5 dB improvement in PSNR) and 3D geometry reconstruction (60% lower Chamfer distance).more » « less
-
Most 3D laser scanners are based on 3D optical triangulation algorithms, where the location of each 3D point is estimated as the intersection of a camera ray and a plane of light projected by a laser line generator. Since a physical laser line generator projects a sheet of light of finite thickness, inaccurate measurement and errors result from assuming that the plane of light is infinitesimally thin. We propose a new mathematical formulation for 3D optical triangulation based on interval arithmetic, where 3D points are only determined within certain bounds along the camera rays, and multiple measurements are used to tighten these bounds. We propose the Line Segment Cloud as an alternative surface representation to visualize the measurement errors within the proposed framework. We introduce the Iterative Line Segment Tightening algorithm to convert line segment clouds to point clouds, as a preprocessing step prior to surface reconstruction. We describe how to construct a low cost laser line 3D scanner, where the camera is fixed with respect to the object and the laser line generator is mounted on a high resolution motion platform. We describe a GPU-based implementation where the large number of captured images are processed in real time. Finally, we present some experimental results.more » « less