skip to main content


Title: Talk the talk and walk the walk: Dialogue-driven navigation in unknown indoor environments
Prior work in natural-language-driven navigation demonstrates success in systems deployed in synthetic environments or applied to large datasets, both real and synthetic. However, there is an absence of such frameworks being deployed and rigorously tested in real environments, unknown a priori. In this paper, we present a novel framework that uses spoken dialogue with a real person to interpret a set of navigational instructions into a plan and subsequently execute that plan in a novel, unknown, indoor environment. This framework is implemented on a real robot and its performance is evaluated in 39 trials across 3 novel test-building environments. We also demonstrate that our approach outperforms three prior vision-and-language navigation methods in this same environment.  more » « less
Award ID(s):
1734938 1522954
NSF-PAR ID:
10284074
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
2021 IEEE/RSJ International Conference on Intelligent Robots and Systems
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Contemporary approaches to perception, planning, estimation, and control have allowed robots to operate robustly as our remote surrogates in uncertain, unstructured environments. This progress now creates an opportunity for robots to operate not only in isolation, but also with and alongside humans in our complex environments. Realizing this opportunity requires an efficient and flexible medium through which humans can communicate with collaborative robots. Natural language provides one such medium, and through significant progress in statistical methods for natural-language understanding, robots are now able to interpret a diverse array of free-form navigation, manipulation, and mobile-manipulation commands. However, most contemporary approaches require a detailed, prior spatial-semantic map of the robot’s environment that models the space of possible referents of an utterance. Consequently, these methods fail when robots are deployed in new, previously unknown, or partially-observed environments, particularly when mental models of the environment differ between the human operator and the robot. This paper provides a comprehensive description of a novel learning framework that allows field and service robots to interpret and correctly execute natural-language instructions in a priori unknown, unstructured environments. Integral to our approach is its use of language as a “sensor”—inferring spatial, topological, and semantic information implicit in natural-language utterances and then exploiting this information to learn a distribution over a latent environment model. We incorporate this distribution in a probabilistic, language grounding model and infer a distribution over a symbolic representation of the robot’s action space, consistent with the utterance. We use imitation learning to identify a belief-space policy that reasons over the environment and behavior distributions. We evaluate our framework through a variety of different navigation and mobile-manipulation experiments involving an unmanned ground vehicle, a robotic wheelchair, and a mobile manipulator, demonstrating that the algorithm can follow natural-language instructions without prior knowledge of the environment. 
    more » « less
  2. Abstract— Navigation, the ability to relocate from one place to another, is a critical skill for any individual or group. Navigating safely across unknown environments is a critical factor in determining the success of a mission. While there is an existing body of applications in the field of land navigation, they primarily rely on GPS-enabled technologies. Moreover, there is limited research on Augmented Reality (AR) as a tool for navigation in unknown environments. This research proposes to develop an AR system to provide 3-dimensional (3D) navigational insights in unfamiliar environments. This can be accomplished by generating 3D terrestrial maps leveraging Synthetic Aperture Radar (SAR) data, Google earth imagery and sparse knowledge of GPS coordinates of the region. Furthermore, the 3D terrestrial images are converted to navigational meshes to make it feasible for path-finding algorithms to work. The proposed method can be used to create an iteratively refined 3D landscape knowledge-database that can assist personnel in navigating novel environments or assist in mission planning for other operations. It can also be used to help plan/access the best strategic vantage points in the landscape. Keywords— navigation, three-dimensional, image processing, mesh, augmented reality, mixed reality, SAR, GPS 
    more » « less
  3. This paper takes the first step towards a reactive, hierarchical multi-robot task allocation and planning framework given a global Linear Temporal Logic specification. The capabilities of both quadrupedal and wheeled robots are leveraged via a heterogeneous team to accomplish a variety of navigation and delivery tasks. However, when deployed in the real world, all robots can be susceptible to different types of disturbances, including but not limited to locomotion failures, human interventions, and obstructions from the environment. To address these disturbances, we propose task-level local and global reallocation strategies to efficiently generate updated action-state sequences online while guaranteeing the completion of the original task. These task reallocation approaches eliminate reconstructing the entire plan or resynthesizing a new task. To integrate the task planner with low-level inputs, a Behavior Tree execution layer monitors different types of disturbances and employs the reallocation methods to make corresponding recovery strategies. To evaluate this planning framework, dynamic simulations are conducted in a realistic hospital environment with a heterogeneous robot team consisting of quadrupeds and wheeled robots for delivery tasks. 
    more » « less
  4. Recent advances in data-driven models for grounded language understanding have enabled robots to interpret increasingly complex instructions. Two fundamental limitations of these methods are that most require a full model of the environment to be known a priori, and they attempt to reason over a world representation that is flat and unnecessarily detailed, which limits scalability. Recent semantic mapping methods address partial observability by exploiting language as a sensor to infer a distribution over topological, metric and semantic properties of the environment. However, maintaining a distribution over highly detailed maps that can support grounding of diverse instructions is computationally expensive and hinders real-time human-robot collaboration. We propose a novel framework that learns to adapt perception according to the task in order to maintain compact distributions over semantic maps. Experiments with a mobile manipulator demonstrate more efficient instruction following in a priori unknown environments. 
    more » « less
  5. Algorithms for motion planning in unknown environments are generally limited in their ability to reason about the structure of the unobserved environment. As such, current methods generally navigate unknown environments by relying on heuristic methods to choose intermediate objectives along frontiers. We present a unified method that combines map prediction and motion planning for safe, time-efficient autonomous navigation of unknown environments by dynamically-constrained robots. We propose a data-driven method for predicting the map of the unobserved environment, using the robot's observations of its surroundings as context. These map predictions are then used to plan trajectories from the robot's position to the goal without requiring frontier selection. We demonstrate that our map-predictive motion planning strategy yields a substantial improvement in trajectory time over a naive frontier pursuit method and demonstrates similar performance to methods using more sophisticated frontier selection heuristics with significantly shorter computation time. 
    more » « less