skip to main content


Title: Coverage Path Planning for Mapping of Underwater Structures
This paper addresses the problem of the coverage path planning in a 3D environment for surveying underwater structures. We propose to use the navigation strategy that a human diver will execute when circumnavigating around a region of interest, in particular when collecting data from a shipwreck. In contrast to the previous methods in the literature, we are aiming to perform coverage in completely unknown environment with some initial prior information. Our proposed method uses convolutional neural networks to learn the control commands based on the visual input. Preliminary results and a detailed overview of the proposed method are discussed.  more » « less
Award ID(s):
2024741
NSF-PAR ID:
10296203
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Global Oceans 2020: Singapore – U.S. Gulf Coast
Page Range / eLocation ID:
1 to 6
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Abstract This work proposes vision-only navigation strategies for an autonomous underwater robot. This approach is a step towards solving the coverage path planning problem in a 3-D environment for surveying underwater structures. Given the challenging conditions of the underwater domain, it is very complicated to obtain accurate state estimates reliably. Consequently, it is a great challenge to extend known path planning or coverage techniques developed for aerial or ground robot controls. In this work, we are investigating a navigation strategy utilizing only vision to assist in covering a complex underwater structure. We propose to use a navigation strategy akin to what a human diver will execute when circumnavigating around a region of interest, in particular when collecting data from a shipwreck. The focus of this article is a step towards enabling the autonomous operation of lightweight robots near underwater wrecks in order to collect data for creating photo-realistic maps and volumetric 3-D models while at the same time avoiding collisions. The proposed method uses convolutional neural networks to learn the control commands based on the visual input. We have demonstrated the feasibility of using a system based only on vision to learn specific strategies of navigation with 80% accuracy on the prediction of control command changes. Experimental results and a detailed overview of the proposed method are discussed. 
    more » « less
  2. Abstract

    Reinforcement learning is a general technique that allows an agent to learn an optimal policy and interact with an environment in sequential decision-making problems. The goodness of a policy is measured by its value function starting from some initial state. The focus of this paper was to construct confidence intervals (CIs) for a policy’s value in infinite horizon settings where the number of decision points diverges to infinity. We propose to model the action-value state function (Q-function) associated with a policy based on series/sieve method to derive its confidence interval. When the target policy depends on the observed data as well, we propose a SequentiAl Value Evaluation (SAVE) method to recursively update the estimated policy and its value estimator. As long as either the number of trajectories or the number of decision points diverges to infinity, we show that the proposed CI achieves nominal coverage even in cases where the optimal policy is not unique. Simulation studies are conducted to back up our theoretical findings. We apply the proposed method to a dataset from mobile health studies and find that reinforcement learning algorithms could help improve patient’s health status. A Python implementation of the proposed procedure is available at https://github.com/shengzhang37/SAVE.

     
    more » « less
  3. Summary

    This paper introduces an assumption-lean method that constructs valid and efficient lower predictive bounds for survival times with censored data. We build on recent work by Candès et al. (2023), whose approach first subsets the data to discard any data points with early censoring times and then uses a reweighting technique, namely, weighted conformal inference (Tibshirani et al., 2019), to correct for the distribution shift introduced by this subsetting procedure. For our new method, instead of constraining to a fixed threshold for the censoring time when subsetting the data, we allow for a covariate-dependent and data-adaptive subsetting step, which is better able to capture the heterogeneity of the censoring mechanism. As a result, our method can lead to lower predictive bounds that are less conservative and give more accurate information. We show that in the Type-I right-censoring setting, if either the censoring mechanism or the conditional quantile of the survival time is well estimated, our proposed procedure achieves nearly exact marginal coverage, where in the latter case we additionally have approximate conditional coverage. We evaluate the validity and efficiency of our proposed algorithm in numerical experiments, illustrating its advantage when compared with other competing methods. Finally, our method is applied to a real dataset to generate lower predictive bounds for users’ active times on a mobile app.

     
    more » « less
  4. Sensor coverage is the critical multi-robot problem of maximizing the detection of events in an environment through the deployment of multiple robots. Large multi-robot systems are often composed of simple robots that are typically not equipped with a complete set of sensors, so teams with comprehensive sensing abilities are required to properly cover an area. Robots also exhibit multiple forms of relationships (e.g., communication connections or spatial distribution) that need to be considered when assigning robot teams for sensor coverage. To address this problem, in this paper we introduce a novel formulation of sensor coverage by multi-robot systems with heterogeneous relationships as a graph representation learning problem. We propose a principled approach based on the mathematical framework of regularized optimization to learn a unified representation of the multi-robot system from the graphs describing the heterogeneous relationships and to identify the learned representation’s underlying structure in order to assign the robots to teams. To evaluate the proposed approach, we conduct extensive experiments on simulated multi-robot systems and a physical multi-robot system as a case study, demonstrating that our approach is able to effectively assign teams for heterogeneous multi-robot sensor coverage. 
    more » « less
  5. Offline or batch reinforcement learning seeks to learn a near-optimal policy using history data without active exploration of the environment. To counter the insufficient coverage and sample scarcity of many offline datasets, the principle of pessimism has been recently introduced to mitigate high bias of the estimated values. While pessimistic variants of model-based algorithms (e.g., value iteration with lower confidence bounds) have been theoretically investigated, their model-free counterparts — which do not require explicit model estimation — have not been adequately studied, especially in terms of sample efficiency. To address this inadequacy, we study a pessimistic variant of Q-learning in the context of finite-horizon Markov decision processes, and characterize its sample complexity under the single-policy concentrability assumption which does not require the full coverage of the state-action space. In addition, a variance-reduced pessimistic Q-learning algorithm is proposed to achieve near-optimal sample complexity. Altogether, this work highlights the efficiency of model-free algorithms in offline RL when used in conjunction with pessimism and variance reduction. 
    more » « less