Semantic cues and statistical regularities in real-world environment layouts can improve efficiency for navigation in novel environments. This paper learns and leverages such semantic cues for navigating to objects of interest in novel environments, by simply watching YouTube videos. This is challenging because YouTube videos do not come with labels for actions or goals, and may not even showcase optimal behavior. Our method tackles these challenges through the use of Q-learning on pseudo-labeled transition quadruples (image, action, next image, reward). We show that such off-policy Q-learning from passive data is able to learn meaningful semantic cues for navigation. These cues, when used in a hierarchical navigation policy, lead to improved efficiency at the ObjectGoal task in visually realistic simulations. We observe a relative improvement of 15-83% over end-to-end RL, behavior cloning, and classical methods, while using minimal direct interaction.
more »
« less
NavTuner: Learning a Scene-Sensitive Family of Navigation Policies
The advent of deep learning has inspired research into end-to-end learning for a variety of problem domains in robotics. For navigation, the resulting methods may not have the generalization properties desired let alone match the performance of traditional methods. Instead of learning a navigation policy, we explore learning an adaptive policy in the parameter space of an existing navigation module. Having adaptive parameters provides the navigation module with a family of policies that can be dynamically reconfigured based on the local scene structure and addresses the common assertion in machine learning that engineered solutions are inflexible. Of the methods tested, reinforcement learning (RL) is shown to provide a significant performance boost to a modern navigation method through reduced sensitivity of its success rate to environmental clutter. The outcomes indicate that RL as a meta-policy learner, or dynamic parameter tuner, effectively robustifies algorithms sensitive to external, measurable nuisance factors.
more »
« less
- Award ID(s):
- 1849333
- PAR ID:
- 10318616
- Date Published:
- Journal Name:
- International Conference on Intelligent Robots and Systems
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Mobile robot navigation is a critical aspect of robotics, with applications spanning from service robots to industrial automation. However, navigating in complex and dynamic environments poses many challenges, such as avoiding obstacles, making decisions in real-time, and adapting to new situations. Reinforcement Learning (RL) has emerged as a promising approach to enable robots to learn navigation policies from their interactions with the environment. However, application of RL methods to real-world tasks such as mobile robot navigation, and evaluating their performance under various training–testing settings has not been sufficiently researched. In this paper, we have designed an evaluation framework that investigates the RL algorithm’s generalization capability in regard to unseen scenarios in terms of learning convergence and success rates by transferring learned policies in simulation to physical environments. To achieve this, we designed a simulated environment in Gazebo for training the robot over a high number of episodes. The training environment closely mimics the typical indoor scenarios that a mobile robot can encounter, replicating real-world challenges. For evaluation, we designed physical environments with and without unforeseen indoor scenarios. This evaluation framework outputs statistical metrics, which we then use to conduct an extensive study on a deep RL method, namely the proximal policy optimization (PPO). The results provide valuable insights into the strengths and limitations of the method for mobile robot navigation. Our experiments demonstrate that the trained model from simulations can be deployed to the previously unseen physical world with a success rate of over 88%. The insights gained from our study can assist practitioners and researchers in selecting suitable RL approaches and training–testing settings for their specific robotic navigation tasks.more » « less
-
Dealing with high variance is a significant challenge in model-free reinforcement learning (RL). Existing methods are unreliable, exhibiting high variance in performance from run to run using different initializations/seeds. Focusing on problems arising in continuous control, we propose a functional regularization approach to augmenting model-free RL. In particular, we regularize the behavior of the deep policy to be similar to a control prior, i.e., we regularize in function space. We show that functional regularization yields a bias-variance trade-off, and propose an adaptive tuning strategy to optimize this trade-off. When the prior policy has control-theoretic stability guarantees, we further show that this regularization approximately preserves those stability guarantees throughout learning. We validate our approach empirically on a wide range of settings, and demonstrate significantly reduced variance, guaranteed dynamic stability, and more efficient learning than deep RL alone.more » « less
-
Floods are among the most destructive natural hazards, with damages expected to intensify under climate change and socio-economic pressures. Effective reservoir operation remains a critical yet challenging strategy for mitigating downstream impacts, as operators must navigate nonlinear system dynamics, uncertain inflow forecasts, and trade-offs between competing objectives. This study proposes a novel end-to-end data-driven framework that integrates process-based hydraulic simulations, a Transformer-based surrogate model for flood damage prediction, and reinforcement learning (RL) for reservoir gate operation optimization. The framework is demonstrated using the Coralville Reservoir (Iowa, USA) and two major historical flood events (2008 and 2013). Hydraulic and impact simulations with HEC-RAS and HEC-FIA were used to generate training data, enabling the development of a Transformer model that accurately predicts time-varying flood damages. This surrogate is coupled with a Transformer-enhanced Deep Q-Network (DQN) to derive adaptive gate operation strategies. Results show that the RL-derived optimal policy reduces both peak and time-integrated damages compared to expert and zero-opening benchmarks, while maintaining smooth and feasible operations. Comparative analysis with a genetic algorithm (GA) highlights the robustness of the RL framework, particularly its ability to generalize across uncertain inflows and varying initial storage conditions. Importantly, the adaptive RL policy trained on perturbed synthetic inflows transferred effectively to the hydrologically distinct 2013 event, and fine-tuning achieved near-identical performance to the event-specific optimal policy. These findings highlight the capability of the proposed framework to provide adaptive, transferable, and computationally efficient tools for flood-resilient reservoir operation.more » « less
-
Tamim Asfour, editor in (Ed.)A reinforcement learning (RL) control policy could fail in a new/perturbed environment that is different from the training environment, due to the presence of dynamic variations. For controlling systems with continuous state and action spaces, we propose an add-on approach to robustifying a pre-trained RL policy by augmenting it with an L1 adaptive controller (L1AC). Leveraging the capability of an L1AC for fast estimation and active ompensation of dynamic variations, the proposed approach can improve the robustness of an RL policy which is trained either in a simulator or in the real world without consideration of a broad class of dynamic variations. Numerical and real-world experiments empirically demonstrate the efficacy of the proposed approach in robustifying RL policies trained using both model-free and modelbased methods.more » « less
An official website of the United States government

