skip to main content


Title: Underwater Terrain Reconstruction from Forward-Looking Sonar Imagery
In this paper, we propose a novel approach for underwater simultaneous localization and mapping using a multibeam imaging sonar for 3D terrain mapping tasks. The high levels of noise and the absence of elevation angle information in sonar images present major challenges for data association and accurate 3D mapping. Instead of repeatedly projecting extracted features into Euclidean space, we apply optical flow within bearing-range images for tracking extracted features. To deal with degenerate cases, such as when tracking is interrupted by noise, we model the subsea terrain as a Gaussian Process random field on a Chow–Liu tree. Terrain factors are incorporated into the factor graph, aimed at smoothing the terrain elevation estimate. We demonstrate the performance of our proposed algorithm in a simulated environment, which shows that terrain factors effectively reduce estimation error. We also show ROV experiments performed in a variable-elevation tank environment, where we are able to construct a descriptive and smooth height estimate of the tank bottom.  more » « less
Award ID(s):
1723996
NSF-PAR ID:
10113017
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)
Page Range / eLocation ID:
3471 to 3477
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Existing solutions to visual simultaneous localization and mapping (VSLAM) assume that errors in feature extraction and matching are independent and identically distributed (i.i.d), but this assumption is known to not be true – features extracted from low-contrast regions of images exhibit wider error distributions than features from sharp corners. Furthermore, V-SLAM algorithms are prone to catastrophic tracking failures when sensed images include challenging conditions such as specular reflections, lens flare, or shadows of dynamic objects. To address such failures, previous work has focused on building more robust visual frontends, to filter out challenging features. In this paper, we present introspective vision for SLAM (IV-SLAM), a fundamentally different approach for addressing these challenges. IV-SLAM explicitly models the noise process of reprojection errors from visual features to be context-dependent, and hence non-i.i.d. We introduce an autonomously supervised approach for IV-SLAM to collect training data to learn such a context-aware noise model. Using this learned noise model, IV-SLAM guides feature extraction to select more features from parts of the image that are likely to result in lower noise, and further incorporate the learned noise model into the joint maximum likelihood estimation, thus making it robust to the aforementioned types of errors. We present empirical results to demonstrate that IV-SLAM 1) is able to accurately predict sources of error in input images, 2) reduces tracking error compared to V-SLAM, and 3) increases the mean distance between tracking failures by more than 70% on challenging real robot data compared to V-SLAM. 
    more » « less
  2. Vision-based localization approaches now underpin newly emerging navigation pipelines for myriad use cases, from robotics to assistive technologies. Compared to sensor-based solutions, vision-based localization does not require pre-installed sensor infrastructure, which is costly, time-consuming, and/or often infeasible at scale. Herein, we propose a novel vision-based localization pipeline for a specific use case: navigation support for end users with blindness and low vision. Given a query image taken by an end user on a mobile application, the pipeline leverages a visual place recognition (VPR) algorithm to find similar images in a reference image database of the target space. The geolocations of these similar images are utilized in a downstream task that employs a weighted-average method to estimate the end user’s location. Another downstream task utilizes the perspective-n-point (PnP) algorithm to estimate the end user’s direction by exploiting the 2D–3D point correspondences between the query image and the 3D environment, as extracted from matched images in the database. Additionally, this system implements Dijkstra’s algorithm to calculate a shortest path based on a navigable map that includes the trip origin and destination. The topometric map used for localization and navigation is built using a customized graphical user interface that projects a 3D reconstructed sparse map, built from a sequence of images, to the corresponding a priori 2D floor plan. Sequential images used for map construction can be collected in a pre-mapping step or scavenged through public databases/citizen science. The end-to-end system can be installed on any internet-accessible device with a camera that hosts a custom mobile application. For evaluation purposes, mapping and localization were tested in a complex hospital environment. The evaluation results demonstrate that our system can achieve localization with an average error of less than 1 m without knowledge of the camera’s intrinsic parameters, such as focal length. 
    more » « less
  3. null (Ed.)
    This paper addresses outdoor terrain mapping using overhead images obtained from an unmanned aerial vehicle. Dense depth estimation from aerial images during flight is challenging. While feature-based localization and mapping techniques can deliver real-time odometry and sparse points reconstruction, a dense environment model is generally recovered offline with significant computation and storage. This paper develops a joint 2D-3D learning approach to reconstruct local meshes at each camera keyframe, which can be assembled into a global environment model. Each local mesh is initialized from sparse depth measurements. We associate image features with the mesh vertices through camera projection and apply graph convolution to refine the mesh vertices based on joint 2-D reprojected depth and 3-D mesh supervision. Quantitative and qualitative evaluations using real aerial images show the potential of our method to support environmental monitoring and surveillance applications. 
    more » « less
  4. We assess the accuracy of Structure-from-Motion/Multiview stereo (SM) terrain models acquired ad hoc or without high-resolution ground control to analyze their usage as a base for inexpensive 3D bedrock geologic mapping. Our focus is on techniques that can be utilized in field projects without the use of heavy and/or expensive equipment or the placement of ground control in logistically challenging sites (e.g., steep cliff faces or remote settings). We use a Terrestrial Light Detection and Ranging (LiDAR) survey as a basis for the comparison of two types of SM models: (1) models developed from images acquired in a chartered airplane flight with ground control referenced by natural objects located on Google Earth scenes; and (2) drone flights with a georeference established solely from camera positions located by conventional, differentially corrected Global Navigation Satellite systems (GNSS). We find that all our SM models are indistinguishable in scale from the LiDAR reference model. The SM models do, however, show rigid body translations and rotations, with translations generally within the 1–5 m size of the natural objects used for ground control, the resolution of the GNSS receivers, or both. The rigid body rotations can be attributed to a poor imaging plan, which can be avoided with survey planning. Analyses of point densities in various models show a limitation of Terrestrial LiDAR point clouds as a mapping base due to the rapid falloff of resolution with distance. In contrast, SM models are characterized by relatively uniform point densities controlled by camera optics, the numbers of images, and the distance from the target. This uniform density is the product of the Multiview stereo step in SM processing that fills areas between key points and is important for bedrock geologic mapping because it affords direct interpretation on a point cloud at a relatively uniform scale throughout a model. Our results indicate that these simple methods allow SM model construction to be accurate to the range of conventional GNSS with resolutions to the submeter, even cm, scale depending on data acquisition parameters. Thus, SM models can, and should, serve as a base for high-resolution geologic mapping, particularly in a steep terrain where conventional techniques fail. Our SM models appear to provide accurate visualizations of geologic features over km scales that allow detailed geologic mapping in 3D with a relative accuracy to the decimeter or centimeter level and absolute positioning in the 2–5 m precision of GNSS; a geometric precision that will allow unprecedented new studies of any geologic system where geometry is the fundamental data.

     
    more » « less
  5. Abstract

    Seafloor volcanic eruptions are difficult to directly observe due to lengthy eruption cycles and the remote location of mid‐ocean ridges. Volcanic eruptions in 2005–2006 at 9°50′N on the East Pacific Rise have been well documented, but the lava volume and flow extent remain uncertain because of the limited near‐bottom bathymetric data. We present near‐bottom data collected during 19 autonomous underwater vehicle (AUV)Sentrydives at 9°50′N in 2018, 2019, and 2021. The resulting 1 m‐resolution bathymetric grid and 20 cm‐resolution sidescan sonar images cover 115 km2, and span the entire area of the 2005–2006 eruptions, including an 8 km2pre‐eruption survey collected with AUVABEin 2001. Pre‐ and post‐eruption surveys, combined with sidescan sonar images and seismo‐acoustic impulsive events recorded during the eruptions, are used to quantify the lava flow extent and to estimate changes in seafloor depth caused by lava emplacement. During the 2005–2006 eruptions, lava flowed up to ∼3 km away from the axial summit trough, covering an area of ∼20.8 km2; ∼50% larger than previously thought. Where pre‐ and post‐eruption surveys overlap, individual flow lobes can be resolved, confirming that lava thickness varies from ∼1 to 10 m, and increases with distance from eruptive fissures. The resulting lava volume estimate indicates that ∼57% of the melt extracted from the axial melt lens probably remained in the subsurface as dikes. These observations provide insights into recharge cycles in the subsurface magma system, and are a baseline for studying future eruptions at the 9°50′N area.

     
    more » « less