skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on May 19, 2026

Title: Word2Wave: Language Driven Mission Programming for Efficient Subsea Deployments of Marine Robots
This paper explores the design and development of a language-based interface for dynamic mission programming of autonomous underwater vehicles (AUVs). The proposed `Word2Wave' (W2W) framework enables interactive programming and parameter configuration of AUVs for remote subsea missions. The W2W framework includes: (i) a set of novel language rules and command structures for efficient language-to-mission mapping; (ii) a GPT-based prompt engineering module for training data generation; (iii) a small language model (SLM)-based sequence-to-sequence learning pipeline for mission command generation from human speech or text; and (iv) a novel user interface for 2D mission map visualization and human-machine interfacing. The proposed learning pipeline adapts an SLM named T5-Small that can learn language-to-mission mapping from processed language data effectively, providing robust and efficient performance. In addition to a benchmark evaluation with state-of-the-art, we conduct a user interaction study to demonstrate the effectiveness of W2W over commercial AUV programming interfaces. Across participants, W2W-based programming required less than 10% time for mission programming compared to traditional interfaces; it is deemed to be a simpler and more natural paradigm for subsea mission programming with a usability score of 76.25. W2W opens up promising future research opportunities on hands-free AUV mission programming for efficient subsea deployments.  more » « less
Award ID(s):
2330416
PAR ID:
10653543
Author(s) / Creator(s):
 ;  ;  ;  
Publisher / Repository:
IEEE
Date Published:
Page Range / eLocation ID:
4107 to 4114
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This paper designs a novel geometry-conformal antenna for Magnetic Induction (MI)-based subsea wireless communications for autonomous underwater vehicles (AUV). The designed tri-directional antennas can be wrapped directly on the surface of AUVs, such that the AUVs fluid dynamics are well maintained to ensure power efficiency of the vehicles. In addition, ferrite materials are added between the MI antenna and the metallic body surface of the AUVs to overcome the shielding effect and enhance the MI signal strength. The designed MI communication system is implemented in hardware and the effectiveness of the geometry-conformal MI antenna is demonstrated through COMSOL simulations and lab experiments. 
    more » « less
  2. Underwater robots, including Remote Operating Vehicles (ROV) and Autonomous Underwater Vehicles (AUV), are currently used to support underwater missions that are either impossible or too risky to be performed by manned systems. In recent years the academia and robotic industry have paved paths for tackling technical challenges for ROV/AUV operations. The level of intelligence of ROV/AUV has increased dramatically because of the recent advances in low-power-consumption embedded computing devices and machine intelligence (e.g., AI). Nonetheless, operating precisely underwater is still extremely challenging to minimize human intervention due to the inherent challenges and uncertainties associated with the underwater environments. Proximity operations, especially those requiring precise manipulation, are still carried out by ROV systems that are fully controlled by a human pilot. A workplace-ready and worker-friendly ROV interface that properly simplifies operator control and increases remote operation confidence is the central challenge for the wide adaptation of ROVs. This paper examines the recent advances of virtual telepresence technologies as a solution for lowering the barriers to the human-in-the-loop ROV teleoperation. Virtual telepresence refers to Virtual Reality (VR) related technologies that help a user to feel that they were in a hazardous situation without being present at the actual location. We present a pilot system of using a VR-based sensory simulator to convert ROV sensor data into human-perceivable sensations (e.g., haptics). Building on a cloud server for real-time rendering in VR, a less trained operator could possibly operate a remote ROV thousand miles away without losing the minimum situational awareness. The system is expected to enable an intensive human engagement on ROV teleoperation, augmenting abilities for maneuvering and navigating ROV in unknown and less explored subsea regions and works. This paper also discusses the opportunities and challenges of this technology for ad hoc training, workforce preparation, and safety in the future maritime industry. We expect that lessons learned from our work can help democratize human presence in future subsea engineering works, by accommodating human needs and limitations to lower the entrance barrier. 
    more » « less
  3. — In this paper, we present CaveSeg - the first visual learning pipeline for semantic segmentation and scene parsing for AUV navigation inside underwater caves. We address the problem of scarce annotated training data by preparing a comprehensive dataset for semantic segmentation of underwater cave scenes. It contains pixel annotations for important navigation markers (e.g. caveline, arrows), obstacles (e.g. ground plain and overhead layers), scuba divers, and open areas for servoing. Through comprehensive benchmark analyses on cave systems in USA, Mexico, and Spain locations, we demonstrate that robust deep visual models can be developed based on CaveSeg for fast semantic scene parsing of underwater cave environments. In particular, we formulate a novel transformer-based model that is computationally light and offers near real-time execution in addition to achieving state-of-the-art performance. Finally, we explore the design choices and implications of semantic segmentation for visual servoing by AUVs inside underwater caves. The proposed model and benchmark dataset open up promising opportunities for future research in autonomous underwater cave exploration and mapping. 
    more » « less
  4. This work studies online learning-based trajectory planning for multiple autonomous underwater vehicles (AUVs) to estimate a water parameter field of interest in the under-ice environment. A centralized system is considered, where several fixed access points on the ice layer are introduced as gateways for communications between the AUVs and a remote data fusion center. We model the water parameter field of interest as a Gaussian process with unknown hyper-parameters. The AUV trajectories for sampling are determined on an epoch-by-epoch basis. At the end of each epoch, the access points relay the observed field samples from all the AUVs to the fusion center, which computes the posterior distribution of the field based on the Gaussian process regression and estimates the field hyper-parameters. The optimal trajectories of all the AUVs in the next epoch are determined to maximize a long-term reward that is defined based on the field uncertainty reduction and the AUV mobility cost, subject to the kinematics constraint, the communication constraint and the sensing area constraint. We formulate the adaptive trajectory planning problem as a Markov decision process (MDP). A reinforcement learning-based online learning algorithm is designed to determine the optimal AUV trajectories in a constrained continuous space. Simulation results show that the proposed learning-based trajectory planning algorithm has performance similar to a benchmark method that assumes perfect knowledge of the field hyper-parameters. 
    more » « less
  5. Abstract Background The Kyoto Encyclopedia of Genes and Genomes (KEGG) provides organized genomic, biomolecular, and metabolic information and knowledge that is reasonably current and highly useful for a wide range of analyses and modeling. KEGG follows the principles of data stewardship to be findable, accessible, interoperable, and reusable (FAIR) by providing RESTful access to their database entries via their web-accessible KEGG API. However, the overall FAIRness of KEGG is often limited by the library and software package support available in a given programming language. While R library support for KEGG is fairly strong, Python library support has been lacking. Moreover, there is no software that provides extensive command line level support for KEGG access and utilization. Results We present kegg_pull, a package implemented in the Python programming language that provides better KEGG access and utilization functionality than previous libraries and software packages. Not only does kegg_pull include an application programming interface (API) for Python programming, it also provides a command line interface (CLI) that enables utilization of KEGG for a wide range of shell scripting and data analysis pipeline use-cases. As kegg_pull’s name implies, both the API and CLI provide versatile options for pulling (downloading and saving) an arbitrary (user defined) number of database entries from the KEGG API. Moreover, this functionality is implemented to efficiently utilize multiple central processing unit cores as demonstrated in several performance tests. Many options are provided to optimize fault-tolerant performance across a single or multiple processes, with recommendations provided based on extensive testing and practical network considerations. Conclusions The new kegg_pull package enables new flexible KEGG retrieval use cases not available in previous software packages. The most notable new feature that kegg_pull provides is its ability to robustly pull an arbitrary number of KEGG entries with a single API method or CLI command, including pulling an entire KEGG database. We provide recommendations to users for the most effective use of kegg_pull according to their network and computational circumstances. 
    more » « less