Existing solutions to visual simultaneous localization and mapping (VSLAM) assume that errors in feature extraction and matching are independent and identically distributed (i.i.d), but this assumption is known to not be true – features extracted from low-contrast regions of images exhibit wider error distributions than features from sharp corners. Furthermore, V-SLAM algorithms are prone to catastrophic tracking failures when sensed images include challenging conditions such as specular reflections, lens flare, or shadows of dynamic objects. To address such failures, previous work has focused on building more robust visual frontends, to filter out challenging features. In this paper, we present introspective vision for SLAM (IV-SLAM), a fundamentally different approach for addressing these challenges. IV-SLAM explicitly models the noise process of reprojection errors from visual features to be context-dependent, and hence non-i.i.d. We introduce an autonomously supervised approach for IV-SLAM to collect training data to learn such a context-aware noise model. Using this learned noise model, IV-SLAM guides feature extraction to select more features from parts of the image that are likely to result in lower noise, and further incorporate the learned noise model into the joint maximum likelihood estimation, thus making it robust to the aforementioned types of errors. We present empirical results to demonstrate that IV-SLAM 1) is able to accurately predict sources of error in input images, 2) reduces tracking error compared to V-SLAM, and 3) increases the mean distance between tracking failures by more than 70% on challenging real robot data compared to V-SLAM.
more »
« less
Underwater Terrain Reconstruction from Forward-Looking Sonar Imagery
In this paper, we propose a novel approach for underwater simultaneous localization and mapping using a multibeam imaging sonar for 3D terrain mapping tasks. The high levels of noise and the absence of elevation angle information in sonar images present major challenges for data association and accurate 3D mapping. Instead of repeatedly projecting extracted features into Euclidean space, we apply optical flow within bearing-range images for tracking extracted features. To deal with degenerate cases, such as when tracking is interrupted by noise, we model the subsea terrain as a Gaussian Process random field on a Chow–Liu tree. Terrain factors are incorporated into the factor graph, aimed at smoothing the terrain elevation estimate. We demonstrate the performance of our proposed algorithm in a simulated environment, which shows that terrain factors effectively reduce estimation error. We also show ROV experiments performed in a variable-elevation tank environment, where we are able to construct a descriptive and smooth height estimate of the tank bottom.
more »
« less
- Award ID(s):
- 1723996
- PAR ID:
- 10113017
- Date Published:
- Journal Name:
- Proceedings of the IEEE International Conference on Robotics and Automation (ICRA)
- Page Range / eLocation ID:
- 3471 to 3477
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)This paper addresses outdoor terrain mapping using overhead images obtained from an unmanned aerial vehicle. Dense depth estimation from aerial images during flight is challenging. While feature-based localization and mapping techniques can deliver real-time odometry and sparse points reconstruction, a dense environment model is generally recovered offline with significant computation and storage. This paper develops a joint 2D-3D learning approach to reconstruct local meshes at each camera keyframe, which can be assembled into a global environment model. Each local mesh is initialized from sparse depth measurements. We associate image features with the mesh vertices through camera projection and apply graph convolution to refine the mesh vertices based on joint 2-D reprojected depth and 3-D mesh supervision. Quantitative and qualitative evaluations using real aerial images show the potential of our method to support environmental monitoring and surveillance applications.more » « less
-
We present a method for 3D non-rigid motion tracking and structure reconstruction from 2D points and curve segments from a sequence of perspective images. The 3D locations of features in the first frame are known. The 3D affine motion model is used to describe the nonrigid motion. The results from synthetic and real data are presented. The applications include: lip tracking, MPEG4 face player, and burn scar assessment. The results show that: 1) curve segments are more robust under noise (observed from synthetic data with different Gaussian noise level); and 2) using both feature yields a significant performance gain in real data.more » « less
-
Accurate mapping of nearshore bathymetry is essential for coastal management, navigation, and environmental monitoring. Traditional bathymetric mapping methods such as sonar surveys and LiDAR are often time-consuming and costly. This paper introduces BathyFormer, a novel vision transformer- and encoder-based deep learning model designed to estimate nearshore bathymetry from high-resolution multispectral satellite imagery. This methodology involves training the BathyFormer model on a dataset comprising satellite images and corresponding bathymetric data obtained from the Continuously Updated Digital Elevation Model (CUDEM). The model learns to predict water depths by analyzing the spectral signatures and spatial patterns present in the multispectral imagery. Validation of the estimated bathymetry maps using independent hydrographic survey data produces a root mean squared error (RMSE) ranging from 0.55 to 0.73 m at depths of 2 to 5 m across three different locations within the Chesapeake Bay, which were independent of the training set. This approach shows significant promise for large-scale, cost-effective shallow water nearshore bathymetric mapping, providing a valuable tool for coastal scientists, marine planners, and environmental managers.more » « less
-
Vision-based localization approaches now underpin newly emerging navigation pipelines for myriad use cases, from robotics to assistive technologies. Compared to sensor-based solutions, vision-based localization does not require pre-installed sensor infrastructure, which is costly, time-consuming, and/or often infeasible at scale. Herein, we propose a novel vision-based localization pipeline for a specific use case: navigation support for end users with blindness and low vision. Given a query image taken by an end user on a mobile application, the pipeline leverages a visual place recognition (VPR) algorithm to find similar images in a reference image database of the target space. The geolocations of these similar images are utilized in a downstream task that employs a weighted-average method to estimate the end user’s location. Another downstream task utilizes the perspective-n-point (PnP) algorithm to estimate the end user’s direction by exploiting the 2D–3D point correspondences between the query image and the 3D environment, as extracted from matched images in the database. Additionally, this system implements Dijkstra’s algorithm to calculate a shortest path based on a navigable map that includes the trip origin and destination. The topometric map used for localization and navigation is built using a customized graphical user interface that projects a 3D reconstructed sparse map, built from a sequence of images, to the corresponding a priori 2D floor plan. Sequential images used for map construction can be collected in a pre-mapping step or scavenged through public databases/citizen science. The end-to-end system can be installed on any internet-accessible device with a camera that hosts a custom mobile application. For evaluation purposes, mapping and localization were tested in a complex hospital environment. The evaluation results demonstrate that our system can achieve localization with an average error of less than 1 m without knowledge of the camera’s intrinsic parameters, such as focal length.more » « less
An official website of the United States government

