Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
— In this paper, we present CaveSeg - the first visual learning pipeline for semantic segmentation and scene parsing for AUV navigation inside underwater caves. We address the problem of scarce annotated training data by preparing a comprehensive dataset for semantic segmentation of underwater cave scenes. It contains pixel annotations for important navigation markers (e.g. caveline, arrows), obstacles (e.g. ground plain and overhead layers), scuba divers, and open areas for servoing. Through comprehensive benchmark analyses on cave systems in USA, Mexico, and Spain locations, we demonstrate that robust deep visual models can be developed based on CaveSeg for fast semantic scene parsing of underwater cave environments. In particular, we formulate a novel transformer-based model that is computationally light and offers near real-time execution in addition to achieving state-of-the-art performance. Finally, we explore the design choices and implications of semantic segmentation for visual servoing by AUVs inside underwater caves. The proposed model and benchmark dataset open up promising opportunities for future research in autonomous underwater cave exploration and mapping.more » « lessFree, publicly-accessible full text available May 13, 2025
-
This paper explores the problem of deploying machine learning (ML)-based object detection and segmentation models on edge platforms to enable realtime caveline detection for Autonomous Underwater Vehicles (AUVs) used for under-water cave exploration and mapping. We specifically investigate three ML models, i.e., U-Net, Vision Transformer (ViT), and YOLOv8, deployed on three edge platforms: Raspberry Pi-4, Intel Neural Compute Stick 2 (NCS2), and NVIDIA Jetson Nano. The experimental results unveil clear tradeoffs between model accuracy, processing speed, and energy consumption. The most accurate model has shown to be U-Net with an 85.53 F1-score and 85.38 Intersection Over Union (IoU) value. Meanwhile, the highest inference speed and lowest energy consumption are achieved by the YOLOv8 model deployed on Jetson Nano operating in the high-power and low-power modes, respectively. The comprehensive quantitative analyses and comparative results provided in the paper highlight important nuances that can guide the deployment of caveline detection systems on underwater robots for ensuring safe and reliable AUV navigation during underwater cave exploration and mapping missions.more » « less
-
Underwater caves are challenging environments that are crucial for water resource management, and for our understanding of hydro-geology and history. Mapping underwater caves is a time-consuming, labor-intensive, and hazardous operation. For autonomous cave mapping by underwater robots, the major challenge lies in vision-based estimation in the complete absence of ambient light, which results in constantly moving shadows due to the motion of the camera-light setup. Thus, detecting and following the caveline as navigation guidance is paramount for robots in autonomous cave mapping missions. In this paper, we present a computationally light caveline detection model based on a novel Vision Transformer (ViT)-based learning pipeline. We address the problem of scarce annotated training data by a weakly supervised formulation where the learning is reinforced through a series of noisy predictions from intermediate sub-optimal models. We validate the utility and effectiveness of such weak supervision for caveline detection and tracking in three different cave locations: USA, Mexico, and Spain. Experimental results demonstrate that our proposed model, CL-ViT, balances the robustness-efficiency trade-off, ensuring good generalization performance while offering 10+ FPS on single-board (Jetson TX2) devices.more » « less