This paper addresses the Informative Path Planning (IPP) algorithm for autonomous robots to explore unknown 2D environments for mapping purposes. IPP can be beneficial to many applications such as search and rescue and cave exploration, where mapping an unknown environment is necessary. Autonomous robots' limited operation time due to their finite battery necessitates an efficient IPP algorithm, however, it is challenging because autonomous robots may not have any information about the environment. In this paper, we formulate a mathematical structure of the IPP problem along with the derivation of the optimal control input. Then, a discretized model for the IPP algorithm is presented as a solution for exploring an unknown environment. The proposed approach provides relatively fast computation time while being applicable to broad robot and sensor platforms. Various simulation results are provided to show the performance of the proposed IPP algorithm.
more »
« less
Bayesian reinforcement learning for navigation planning in unknown environments
This study focuses on a rescue mission problem, particularly enabling agents/robots to navigate efficiently in unknown environments. Technological advances, including manufacturing, sensing, and communication systems, have raised interest in using robots or drones for rescue operations. Effective rescue operations require quick identification of changes in the environment and/or locating the victims/injuries as soon as possible. Several techniques have been developed in recent years for autonomy in rescue missions, including motion planning, adaptive control, and more recently, reinforcement learning techniques. These techniques rely on full knowledge of the environment or the availability of simulators that can represent real environments during rescue operations. However, in practice, agents might have little or no information about the environment or the number or locations of injuries, preventing/limiting the application of most existing techniques. This study provides a probabilistic/Bayesian representation of the unknown environment, which jointly models the stochasticity in the agent's navigation and the environment uncertainty into a vector called the belief state. This belief state allows offline learning of the optimal Bayesian policy in an unknown environment without the need for any real data/interactions, which guarantees taking actions that are optimal given all available information. To address the large size of belief space, deep reinforcement learning is developed for computing an approximate Bayesian planning policy. The numerical experiments using different maze problems demonstrate the high performance of the proposed policy.
more »
« less
- Award ID(s):
- 2311969
- PAR ID:
- 10596255
- Publisher / Repository:
- Frontiers
- Date Published:
- Journal Name:
- Frontiers in Artificial Intelligence
- Volume:
- 7
- ISSN:
- 2624-8212
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Contemporary approaches to perception, planning, estimation, and control have allowed robots to operate robustly as our remote surrogates in uncertain, unstructured environments. This progress now creates an opportunity for robots to operate not only in isolation, but also with and alongside humans in our complex environments. Realizing this opportunity requires an efficient and flexible medium through which humans can communicate with collaborative robots. Natural language provides one such medium, and through significant progress in statistical methods for natural-language understanding, robots are now able to interpret a diverse array of free-form navigation, manipulation, and mobile-manipulation commands. However, most contemporary approaches require a detailed, prior spatial-semantic map of the robot’s environment that models the space of possible referents of an utterance. Consequently, these methods fail when robots are deployed in new, previously unknown, or partially-observed environments, particularly when mental models of the environment differ between the human operator and the robot. This paper provides a comprehensive description of a novel learning framework that allows field and service robots to interpret and correctly execute natural-language instructions in a priori unknown, unstructured environments. Integral to our approach is its use of language as a “sensor”—inferring spatial, topological, and semantic information implicit in natural-language utterances and then exploiting this information to learn a distribution over a latent environment model. We incorporate this distribution in a probabilistic, language grounding model and infer a distribution over a symbolic representation of the robot’s action space, consistent with the utterance. We use imitation learning to identify a belief-space policy that reasons over the environment and behavior distributions. We evaluate our framework through a variety of different navigation and mobile-manipulation experiments involving an unmanned ground vehicle, a robotic wheelchair, and a mobile manipulator, demonstrating that the algorithm can follow natural-language instructions without prior knowledge of the environment.more » « less
-
Abstract Partially Observable Markov Decision Processes (POMDPs) can model complex sequential decision-making problems under stochastic and uncertain environments. A main reason hindering their broad adoption in real-world applications is the unavailability of a suitable POMDP model or a simulator thereof. Available solution algorithms, such as Reinforcement Learning (RL), typically benefit from the knowledge of the transition dynamics and the observation generating process, which are often unknown and non-trivial to infer. In this work, we propose a combined framework for inference and robust solution of POMDPs via deep RL. First, all transition and observation model parameters are jointly inferred via Markov Chain Monte Carlo sampling of a hidden Markov model, which is conditioned on actions, in order to recover full posterior distributions from the available data. The POMDP with uncertain parameters is then solved via deep RL techniques with the parameter distributions incorporated into the solution via domain randomization, in order to develop solutions that are robust to model uncertainty. As a further contribution, we compare the use of Transformers and long short-term memory networks, which constitute model-free RL solutions and work directly on the observation space, with an approach termed the belief-input method, which works on the belief space by exploiting the learned POMDP model for belief inference. We apply these methods to the real-world problem of optimal maintenance planning for railway assets and compare the results with the current real-life policy. We show that the RL policy learned by the belief-input method is able to outperform the real-life policy by yielding significantly reduced life-cycle costs.more » « less
-
We explore a Federated Reinforcement Learning (FRL) problem where agents collaboratively learn a common policy without sharing their trajectory data. To date, existing FRL work has primarily focused on agents operating in the same or ``similar" environments. In contrast, our problem setup allows for arbitrarily large levels of environment heterogeneity. To obtain the optimal policy which maximizes the average performance across all potentially completely different environments, we propose two algorithms: FedSVRPG-M and FedHAPG-M. In contrast to existing results, we demonstrate that both FedSVRPG-M and FedHAPG-M, both of which leverage momentum mechanisms, can exactly converge to a stationary point of the average performance function, regardless of the magnitude of environment heterogeneity. Furthermore, by incorporating the benefits of variance-reduction techniques or Hessian approximation, both algorithms achieve state-of-the-art convergence results, characterized by a sample complexity of $$O(\epsilon^{-\frac{3}{2}}/N)$$. Notably, our algorithms enjoy linear convergence speedups with respect to the number of agents, highlighting the benefit of collaboration among agents in finding a common policy.more » « less
-
This paper addresses incomplete-information dynamic games, where reward parameters of agents are private. Previous studies have shown that online belief update is necessary for deriving equilibrial policies of such games, especially for high-risk games such as vehicle interactions. However, updating beliefs in real time is computationally expensive as it requires continuous computation of Nash equilibria of the sub-games starting from the current states. In this paper, we consider the triggering mechanism of belief update as a policy defined on the agents’ physical and belief states, and propose learning this policy through reinforcement learning (RL). Using a two-vehicle uncontrolled intersection case, we show that intermittent belief update via RL is sufficient for safe interactions, reducing the computation cost of updates by 59% when agents have full observations of physical states. Simulation results also show that the belief update frequency will increase as noise becomes more significant in measurements of the vehicle positions.more » « less
An official website of the United States government

