In the realm of real-time environmental monitoring and hazard detection, multi-robot systems present a promising solution for exploring and mapping dynamic fields, particularly in scenarios where human intervention poses safety risks. This research introduces a strategy for path planning and control of a group of mobile sensing robots to efficiently explore and reconstruct a dynamic field consisting of multiple non-overlapping diffusion sources. Our approach integrates a reinforcement learning-based path planning algorithm to guide the multi-robot formation in identifying diffusion sources, with a clustering-based method for destination selection once a new source is detected, to enhance coverage and accelerate exploration in unknown environments. Simulation results and real-world laboratory experiments demonstrate the effectiveness of our approach in exploring and reconstructing dynamic fields. This study advances the field of multi-robot systems in environmental monitoring and has practical implications for rescue missions and field explorations.
more »
« less
Geometric Reinforcement Learning Based Path Planning for Mobile Sensor Networks in Advection-Diffusion Field Reconstruction
We propose a geometric reinforcement learning algorithm for real-time path planning for mobile sensor networks (MSNs) in the problem of reconstructing a spatial-temporal varying field described by the advection-diffusion partial differential equation. A Luenberger state estimator is provided to reconstruct the concentration field, which uses the collected measurements from a MSN along its trajectory. Since the path of the MSN is critical in reconstructing the field, a novel geometric reinforcement learning (GRL) algorithm is developed for real-time path planning. The basic idea of GRL is to divide the whole area into a series of lattice to employ a specific time-varying reward matrix, which contains the information of the length of the path and the mapping error. Thus, the proposed GRL can balance the performance of the field reconstruction and the efficiency of the path. By updating the reward matrix, the real-time path planning problem can be converted to the shortest path problem in a weighted graph, which can be solved efficiently using dynamic programming. The convergence of calculating the reward matrix is theoretically proven. Simulation results serve to demonstrate the effectiveness and feasibility of the proposed GRL for an MSN.
more »
« less
- Award ID(s):
- 1663073
- PAR ID:
- 10109638
- Date Published:
- Journal Name:
- Proceeding of 2018 IEEE Conference on Decision and Control
- Page Range / eLocation ID:
- 1949 - 1954
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Summary Path planning is a fundamental and critical task in many robotic applications. For energy‐constrained robot platforms, path planning solutions are desired with minimum time arrivals and minimal energy consumption. Uncertain environments, such as wind conditions, pose challenges to the design of effective minimum time‐energy path planning solutions. In this article, we develop a minimum time‐energy path planning solution in continuous state and control input spaces using integral reinforcement learning (IRL). To provide a baseline solution for the performance evaluation of the proposed solution, we first develop a theoretical analysis for the minimum time‐energy path planning problem in a known environment using the Pontryagin's minimum principle. We then provide an online adaptive solution in an unknown environment using IRL. This is done through transforming the minimum time‐energy problem to an approximate minimum time‐energy problem and then developing an IRL‐based optimal control strategy. Convergence of the IRL‐based optimal control strategy is proven. Simulation studies are developed to compare the theoretical analysis and the proposed IRL‐based algorithm.more » « less
-
Learning optimal policies in real-world domains with delayed rewards is a major challenge in Reinforcement Learning. We address the credit assignment problem by proposing a Gaussian Process (GP)-based immediate reward approximation algorithm and evaluate its effectiveness in 4 contexts where rewards can be delayed for long trajectories. In one GridWorld game and 8 Atari games, where immediate rewards are available, our results showed that on 7 out 9 games, the proposed GP inferred reward policy performed at least as well as the immediate reward policy and significantly outperformed the corresponding delayed reward policy. In e-learning and healthcare applications, we combined GP-inferred immediate rewards with offline Deep Q-Network (DQN) policy induction and showed that the GP-inferred reward policies outperformed the policies induced using delayed rewards in both real-world contexts.more » « less
-
This work studies online learning-based trajectory planning for multiple autonomous underwater vehicles (AUVs) to estimate a water parameter field of interest in the under-ice environment. A centralized system is considered, where several fixed access points on the ice layer are introduced as gateways for communications between the AUVs and a remote data fusion center. We model the water parameter field of interest as a Gaussian process with unknown hyper-parameters. The AUV trajectories for sampling are determined on an epoch-by-epoch basis. At the end of each epoch, the access points relay the observed field samples from all the AUVs to the fusion center, which computes the posterior distribution of the field based on the Gaussian process regression and estimates the field hyper-parameters. The optimal trajectories of all the AUVs in the next epoch are determined to maximize a long-term reward that is defined based on the field uncertainty reduction and the AUV mobility cost, subject to the kinematics constraint, the communication constraint and the sensing area constraint. We formulate the adaptive trajectory planning problem as a Markov decision process (MDP). A reinforcement learning-based online learning algorithm is designed to determine the optimal AUV trajectories in a constrained continuous space. Simulation results show that the proposed learning-based trajectory planning algorithm has performance similar to a benchmark method that assumes perfect knowledge of the field hyper-parameters.more » « less
-
We study the problem of reinforcement learning for a task encoded by a reward machine. The task is defined over a set of properties in the environment, called atomic propositions, and represented by Boolean variables. One unrealistic assumption commonly used in the literature is that the truth values of these propositions are accurately known. In real situations, however, these truth values are uncertain since they come from sensors that suffer from imperfections. At the same time, reward machines can be difficult to model explicitly, especially when they encode complicated tasks. We develop a reinforcement-learning algorithm that infers a reward machine that encodes the underlying task while learning how to execute it, despite the uncertainties of the propositions’ truth values. In order to address such uncertainties, the algorithm maintains a probabilistic estimate about the truth value of the atomic propositions; it updates this estimate according to new sensory measurements that arrive from exploration of the environment. Additionally, the algorithm maintains a hypothesis reward machine, which acts as an estimate of the reward machine that encodes the task to be learned. As the agent explores the environment, the algorithm updates the hypothesis reward machine according to the obtained rewards and the estimate of the atomic propositions’ truth value. Finally, the algorithm uses a Q-learning procedure for the states of the hypothesis reward machine to determine an optimal policy that accomplishes the task. We prove that the algorithm successfully infers the reward machine and asymptotically learns a policy that accomplishes the respective task.more » « less
An official website of the United States government

