skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Reinforcement Learning-Based Multi-AUV Adaptive Trajectory Planning for Under-Ice Field Estimation
This work studies online learning-based trajectory planning for multiple autonomous underwater vehicles (AUVs) to estimate a water parameter field of interest in the under-ice environment. A centralized system is considered, where several fixed access points on the ice layer are introduced as gateways for communications between the AUVs and a remote data fusion center. We model the water parameter field of interest as a Gaussian process with unknown hyper-parameters. The AUV trajectories for sampling are determined on an epoch-by-epoch basis. At the end of each epoch, the access points relay the observed field samples from all the AUVs to the fusion center, which computes the posterior distribution of the field based on the Gaussian process regression and estimates the field hyper-parameters. The optimal trajectories of all the AUVs in the next epoch are determined to maximize a long-term reward that is defined based on the field uncertainty reduction and the AUV mobility cost, subject to the kinematics constraint, the communication constraint and the sensing area constraint. We formulate the adaptive trajectory planning problem as a Markov decision process (MDP). A reinforcement learning-based online learning algorithm is designed to determine the optimal AUV trajectories in a constrained continuous space. Simulation results show that the proposed learning-based trajectory planning algorithm has performance similar to a benchmark method that assumes perfect knowledge of the field hyper-parameters.  more » « less
Award ID(s):
1651135 1551067
PAR ID:
10084100
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Sensors
Volume:
18
Issue:
11
ISSN:
1424-8220
Page Range / eLocation ID:
3859
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract This paper explores the use of autonomous underwater vehicles (AUVs) equipped with sensors to construct water quality models to aid in the assessment of important environmental hazards, for instance related to point‐source pollutants or localized hypoxic regions. Our focus is on problems requiring the autonomous discovery and dense sampling of critical areas of interest in real‐time, for which standard (e.g., grid‐based) strategies are not practical due to AUV power and computing constraints that limit mission duration. To this end, we consider adaptive sampling strategies on Gaussian process (GP) stochastic models of the measured scalar field to focus sampling on the most promising and informative regions. Specifically, this study employs the GP upper confidence bound as the optimization criteria to adaptively plan sampling paths that balance a trade‐off between exploration and exploitation. Two informative path planning algorithms based on (i) branch‐and‐bound techniques and (ii) cross‐entropy optimization are presented for choosing future sampling locations while considering the motion constraints of the sampling platform. The effectiveness of the proposed methods are explored in simulated scalar fields for identifying multiple regions of interest within a three‐dimensional environment. Field experiments with an AUV using both virtual measurements on a known scalar field and in situ dissolved oxygen measurements for studying hypoxic zones validate the approach's capability to quickly explore the given area, and then subsequently increase the sampling density around regions of interest without sacrificing model fidelity of the full sampling area. 
    more » « less
  2. We investigate the problem of simultaneous parameter identification and mapping of a spatially distributed field using a mobile sensor network. We first develop a parametrized model that represents the spatially distributed field. Based on the model, a recursive least squares algorithm is developed to achieve online parameter identification. Next, we design a global state observer, which uses the estimated parameters, together with data collected by the mobile sensor network, to real-timely reconstruct the whole spatial-temporal varying field. Since the performance of the parameter identification and map reconstruction algorithms depends on the trajectories of the mobile sensors, we further develop a Lyapunov redesign based online trajectory planning algorithm for the mobile sensor network so that the mobile sensors can use local real-time information to guide them to move along information-rich paths that can improve the performance of the parameter identification and map construction. Lastly, a cooperative filtering scheme is developed to provide the state estimates of the spatially distributed field, which enables the recursive least squares method. To test the proposed algorithms in realistic scenarios, we first build a CO2 diffusion field in a lab and construct a sensor network to measure the field concentration over time. We then validate the algorithms in the reconstructed CO2 field in simulation. Simulation results demonstrate the efficiency of the proposed method. 
    more » « less
  3. We propose a Bayesian decision making framework for control of Markov Decision Processes (MDPs) with unknown dynamics and large, possibly continuous, state, action, and parameter spaces in data-poor environments. Most of the existing adaptive controllers for MDPs with unknown dynamics are based on the reinforcement learning framework and rely on large data sets acquired by sustained direct interaction with the system or via a simulator. This is not feasible in many applications, due to ethical, economic, and physical constraints. The proposed framework addresses the data poverty issue by decomposing the problem into an offline planning stage that does not rely on sustained direct interaction with the system or simulator and an online execution stage. In the offline process, parallel Gaussian process temporal difference (GPTD) learning techniques are employed for near-optimal Bayesian approximation of the expected discounted reward over a sample drawn from the prior distribution of unknown parameters. In the online stage, the action with the maximum expected return with respect to the posterior distribution of the parameters is selected. This is achieved by an approximation of the posterior distribution using a Markov Chain Monte Carlo (MCMC) algorithm, followed by constructing multiple Gaussian processes over the parameter space for efficient prediction of the means of the expected return at the MCMC sample. The effectiveness of the proposed framework is demonstrated using a simple dynamical system model with continuous state and action spaces, as well as a more complex model for a metastatic melanoma gene regulatory network observed through noisy synthetic gene expression data. 
    more » « less
  4. A bounded cost path planning method is developed for underwater vehicles assisted by a data-driven flow modeling method. The modeled flow field is partitioned as a set of cells of piece-wise constant flow speed. A flow partition algorithm and a parameter estimation algorithm are proposed to learn the flow field structure and parameters with justified convergence. A bounded cost path planning algorithm is developed taking advantage of the partitioned flow model. An extended potential search method is proposed to determine the sequence of partitions that the optimal path crosses. The optimal path within each partition is then determined by solving a constrained optimization problem. Theoretical justification is provided for the proposed extended potential search method generating the optimal solution. The path planned has the highest probability to satisfy the bounded cost constraint. The performance of the algorithms is demonstrated with experimental and simulation results, which show that the proposed method is more computationally efficient than some of the existing methods. 
    more » « less
  5. Nonlinear optimal control problems are challenging to solve efficiently due to non-convexity. This paper introduces a trajectory optimization approach that achieves real-time performance by combining machine learning to predict optimal trajectories with refinement by quadratic optimization. First, a library of optimal trajectories is calculated offline and used to train a neural network. Online, the neural network predicts a trajectory for a novel initial state and cost function, and this prediction is further optimized by a sparse quadratic programming solver. We apply this approach to a fly-to-target movement problem for an indoor quadrotor. Experiments demonstrate that the technique calculates near-optimal trajectories in a few milliseconds, and generates agile movement that can be tracked more accurately than existing methods. 
    more » « less