Mobile robot navigation is a critical aspect of robotics, with applications spanning from service robots to industrial automation. However, navigating in complex and dynamic environments poses many challenges, such as avoiding obstacles, making decisions in real-time, and adapting to new situations. Reinforcement Learning (RL) has emerged as a promising approach to enable robots to learn navigation policies from their interactions with the environment. However, application of RL methods to real-world tasks such as mobile robot navigation, and evaluating their performance under various training–testing settings has not been sufficiently researched. In this paper, we have designed an evaluation framework that investigates the RL algorithm’s generalization capability in regard to unseen scenarios in terms of learning convergence and success rates by transferring learned policies in simulation to physical environments. To achieve this, we designed a simulated environment in Gazebo for training the robot over a high number of episodes. The training environment closely mimics the typical indoor scenarios that a mobile robot can encounter, replicating real-world challenges. For evaluation, we designed physical environments with and without unforeseen indoor scenarios. This evaluation framework outputs statistical metrics, which we then use to conduct an extensive study on a deep RL method, namely the proximal policy optimization (PPO). The results provide valuable insights into the strengths and limitations of the method for mobile robot navigation. Our experiments demonstrate that the trained model from simulations can be deployed to the previously unseen physical world with a success rate of over 88%. The insights gained from our study can assist practitioners and researchers in selecting suitable RL approaches and training–testing settings for their specific robotic navigation tasks.
- Award ID(s):
- 1932529
- PAR ID:
- 10482819
- Editor(s):
- Tamim Asfour, editor in
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- IEEE robotics automation letters
- ISSN:
- 2377-3766
- Subject(s) / Keyword(s):
- Reinforcement learning, machine learning for robot control, robust/adaptive control, robot safety
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
State-of-the-art human-in-the-loop robot grasping is hugely suffered by Electromyography (EMG) inference robustness issues. As a workaround, researchers have been looking into integrating EMG with other signals, often in an ad hoc manner. In this paper, we are presenting a method for end-to-end training of a policy for human-in-the-loop robot grasping on real reaching trajectories. For this purpose we use Reinforcement Learning (RL) and Imitation Learning (IL) in DEXTRON (DEXTerity enviRONment), a stochastic simulation environment with real human trajectories that are augmented and selected using a Monte Carlo (MC) simulation method. We also offer a success model which once trained on the expert policy data and the RL policy roll-out transitions, can provide transparency to how the deep policy works and when it is probably going to fail.more » « less
-
Numerous solutions are proposed for the Traffic Signal Control (TSC) tasks aiming to provide efficient transportation and alleviate traffic congestion. Recently, promising results have been attained by Reinforcement Learning (RL) methods through trial and error in simulators, bringing confidence in solving cities' congestion problems. However, performance gaps still exist when simulator-trained policies are deployed to the real world. This issue is mainly introduced by the system dynamic difference between the training simulators and the real-world environments. In this work, we leverage the knowledge of Large Language Models (LLMs) to understand and profile the system dynamics by a prompt-based grounded action transformation to bridge the performance gap. Specifically, this paper exploits the pre-trained LLM's inference ability to understand how traffic dynamics change with weather conditions, traffic states, and road types. Being aware of the changes, the policies' action is taken and grounded based on realistic dynamics, thus helping the agent learn a more realistic policy. We conduct experiments on four different scenarios to show the effectiveness of the proposed PromptGAT's ability to mitigate the performance gap of reinforcement learning from simulation to reality (sim-to-real).
-
We study adaptive video streaming for multiple users in wireless access edge networks with unreliable channels. The key challenge is to jointly optimize the video bitrate adaptation and resource allocation such that the users' cumulative quality of experience is maximized. This problem is a finite-horizon restless multi-armed multi-action bandit problem and is provably hard to solve. To overcome this challenge, we propose a computationally appealing index policy entitled Quality Index Policy, which is well-defined without the Whittle indexability condition and is provably asymptotically optimal without the global attractor condition. These two conditions are widely needed in the design of most existing index policies, which are difficult to establish in general. Since the wireless access edge network environment is highly dynamic with system parameters unknown and time-varying, we further develop an index-aware reinforcement learning (RL) algorithm dubbed QA-UCB. We show that QA-UCB achieves a sub-linear regret with a low-complexity since it fully exploits the structure of the Quality Index Policy for making decisions. Extensive simulations using real-world traces demonstrate significant gains of proposed policies over conventional approaches. We note that the proposed framework for designing index policy and index-aware RL algorithm is of independent interest and could be useful for other large-scale multi-user problems.more » « less
-
null (Ed.)Flooding in many areas is becoming more prevalent due to factors such as urbanization and climate change, requiring modernization of stormwater infrastructure. Retrofitting standard passive systems with controllable valves/pumps is promising, but requires real-time control (RTC). One method of automating RTC is reinforcement learning (RL), a general technique for sequential optimization and control in uncertain environments. The notion is that an RL algorithm can use inputs of real-time flood data and rainfall forecasts to learn a policy for controlling the stormwater infrastructure to minimize measures of flooding. In real-world conditions, rainfall forecasts and other state information are subject to noise and uncertainty. To account for these characteristics of the problem data, we implemented Deep Deterministic Policy Gradient (DDPG), an RL algorithm that is distinguished by its capability to handle noise in the input data. DDPG implementations were trained and tested against a passive flood control policy. Three primary cases were studied: (i) perfect data, (ii) imperfect rainfall forecasts, and (iii) imperfect water level and forecast data. Rainfall episodes (100) that caused flooding in the passive system were selected from 10 years of observations in Norfolk, Virginia, USA; 85 randomly selected episodes were used for training and the remaining 15 unseen episodes served as test cases. Compared to the passive system, all RL implementations reduced flooding volume by 70.5% on average, and performed within a range of 5%. This suggests that DDPG is robust to noisy input data, which is essential knowledge to advance the real-world applicability of RL for stormwater RTC.more » « less