skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on July 13, 2026

Title: Robot-Gated Interactive Imitation Learning with Adaptive Intervention Mechanism
Interactive Imitation Learning (IIL) enables agents to acquire desired behaviors through human interventions, but existing methods often place heavy cognitive demands on human supervisors. To address this issue, we introduce the Adaptive Intervention Mechanism (AIM), a novel robot-gated IIL algorithm that learns an adaptive criterion for requesting human demonstrations. AIM leverages a proxy Q-function to model the human intervention rule, dynamically adjusting intervention requests based on the alignment between agent and expert actions. The proxy Q-function assigns high values when the agent deviates from expert behavior and gradually reduces these values as the agent improves, allowing the agent to assess real-time alignment and request assistance only when necessary. Expert-in-the-loop experiments demonstrate that AIM reduces expert monitoring effort by 40% compared to the uncertainty-based baseline Thrifty-DAgger, while improving learning efficiency. Moreover, AIM effectively identifies safety-critical states that warrant expert intervention, leading to higher-quality demonstrations and fewer overall expert data and environment interactions. Code and demo video are available at https://github.com/metadriverse/AIM.  more » « less
Award ID(s):
2344955 2339769
PAR ID:
10635719
Author(s) / Creator(s):
; ;
Publisher / Repository:
ICML 2025
Date Published:
Format(s):
Medium: X
Location:
Vancouver, Canada
Sponsoring Org:
National Science Foundation
More Like this
  1. Learning from active human involvement enables the human subject to actively intervene and demonstrate to the AI agent during training. The interaction and corrective feedback from human brings safety and AI alignment to the learning process. In this work, we propose a new reward-free active human involvement method called Proxy Value Propagation for policy optimization. Our key insight is that a proxy value function can be designed to express human intents, wherein state- action pairs in the human demonstration are labeled with high values, while those agents’ actions that are intervened receive low values. Through the TD-learning framework, labeled values of demonstrated state-action pairs are further propagated to other unlabeled data generated from agents’ exploration. The proxy value function thus induces a policy that faithfully emulates human behaviors. Human- in-the-loop experiments show the generality and efficiency of our method. With minimal modification to existing reinforcement learning algorithms, our method can learn to solve continuous and discrete control tasks with various human control devices, including the challenging task of driving in Grand Theft Auto V. Demo video and code are available at: https://metadriverse.github.io/pvp. 
    more » « less
  2. Most existing policy learning solutions require the learning agents to receive high-quality supervision signals, e.g., rewards in reinforcement learning (RL) or high-quality expert demonstrations in behavioral cloning (BC). These quality supervisions are either infeasible or prohibitively expensive to obtain in practice. We aim for a unified framework that leverages the available cheap weak supervisions to perform policy learning efficiently. To handle this problem, we treat the weak supervision'' as imperfect information coming from a peer agent, and evaluate the learning agent's policy based on a correlated agreement'' with the peer agent's policy (instead of simple agreements). Our approach explicitly punishes a policy for overfitting to the weak supervision. In addition to theoretical guarantees, extensive evaluations on tasks including RL with noisy reward, BC with weak demonstrations, and standard policy co-training (RL + BC) show that our method leads to substantial performance improvements, especially when the complexity or the noise of the learning environments is high. 
    more » « less
  3. In this paper, a hybrid shared controller is proposed for assisting human novice users to emulate human expert users within a human-automation interaction framework. This work is motivated to let human novice users learn the skills of human expert users using automation as a medium. Automation interacts with human users in two folds: it learns how to optimally control the system from the experts demonstrations by offline computation, and assists the novice in real time without excess amount of intervention based on the inference of the novice’s skill-level within our properly designed shared controller. Automation takes more control authority when the novices skill-level is poor, or it allows the novice to have more control authority when his/her skill-level is close to that of the expert to let the novice learn from his/her own control experience. The proposed scheme is shown to be able to improve the system performance while minimizing the intervention from the automation, which is demonstrated via an illustrative human-in-the-loop application example. 
    more » « less
  4. We introduce CREStE, a scalable learning-based mapless navigation framework to address the open-world generalization and robustness challenges of outdoor urban navigation. Key to achieving this is learning perceptual representations that generalize to open-set factors (e.g. novel semantic classes, terrains, dynamic entities) and inferring expert-aligned navigation costs from limited demonstrations. CREStE addresses both these issues, introducing 1) a visual foundation model (VFM) distillation objective for learning open-set structured bird's-eye-view perceptual representations, and 2) counterfactual inverse reinforcement learning (IRL), a novel active learning formulation that uses counterfactual trajectory demonstrations to reason about the most important cues when inferring navigation costs. We evaluate CREStE on the task of kilometer-scale mapless navigation in a variety of city, offroad, and residential environments and find that it outperforms all state-of-the-art approaches with 70% fewer human interventions, including a 2-kilometer mission in an unseen environment with just 1 intervention; showcasing its robustness and effectiveness for long-horizon mapless navigation. 
    more » « less
  5. We introduce CREStE, a scalable learning-based mapless navigation framework to address the open-world generalization and robustness challenges of outdoor urban navigation. Key to achieving this is learning perceptual representations that generalize to open-set factors (e.g. novel semantic classes, terrains, dynamic entities) and inferring expert-aligned navigation costs from limited demonstrations. CREStE addresses both these issues, introducing 1) a visual foundation model (VFM) distillation objective for learning open-set structured bird's-eye-view perceptual representations, and 2) counterfactual inverse reinforcement learning (IRL), a novel active learning formulation that uses counterfactual trajectory demonstrations to reason about the most important cues when inferring navigation costs. We evaluate CREStE on the task of kilometer-scale mapless navigation in a variety of city, offroad, and residential environments and find that it outperforms all state-of-the-art approaches with 70% fewer human interventions, including a 2-kilometer mission in an unseen environment with just 1 intervention; showcasing its robustness and effectiveness for long-horizon mapless navigation. Videos and additional materials can be found on the project page: this https URL 
    more » « less