skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on May 23, 2026

Title: BoxMap: Efficient Structural Mapping and Navigation
While humans can successfully navigate using abstractions, ignoring details that are irrelevant to the task at hand, most of the existing approaches in robotics require detailed environment representations which consume a significant amount of sensing, computing, and storage; these issues become particularly important in resource-constrained settings with limited power budgets. Deep learning methods can learn from prior experience to abstract knowledge from novel environments, and use it to more efficiently execute tasks such as frontier exploration, object search, or scene understanding. We propose BoxMap, a Detection-Transformer-based architecture that takes advantage of the structure of the sensed partial environment to update a topological graph of the environment as a set of semantic entities (rooms and doors) and their relations (connectivity). The predictions from low-level measurements can be leveraged to achieve high-level goals with lower computational costs than methods based on detailed representations. As an example application, we consider a robot equipped with a 2-D laser scanner tasked with exploring a residential building. Our BoxMap representation scales quadratically with the number of rooms (with a small constant), resulting in significant savings over a full geometric map. Moreover, our high-level topological representation results in 30.9% shorter trajectories in the exploration task with respect to a standard method. Code is available at: bit.ly/3F6w2Yl.  more » « less
Award ID(s):
2409733
PAR ID:
10628223
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Deep learning methods are widely used in robotic applications. By learning from prior experience, the robot can abstract knowledge of the environment, and use this knowledge to accomplish different goals, such as object search, frontier exploration, or scene understanding, with a smaller amount of resources than might be needed without that knowledge. Most existing methods typically require a significant amount of sensing, which in turn has significant costs in terms of power consumption for acquisition and processing, and typically focus on models that are tuned for each specific goal, leading to the need to train, store and run each one separately. These issues are particularly important in a resource-constrained setting, such as with small-scale robots or during long-duration missions. We propose a single, multi-task deep learning architecture that takes advantage of the structure of the partial environment to predict different abstractions of the environment (thus reducing the need for rich sensing), and to leverage these predictions to simultaneously achieve different high-level goals (thus sharing computation between goals). As an example application of the proposed architecture, we consider the specific example of a robot equipped with a 2-D laser scanner and an object detector, tasked with searching for an object (such as an exit) in a residential building while constructing a topological map that can be used for future missions. The prior knowledge of the environment is encoded using a U-Net deep network architecture. In this context, our work leads to an object search algorithm that is complete, and that outperforms a more traditional frontier-based approach. The topological map we produce uses scene trees to qualitatively represent the environment as a graph at a fraction of the cost of existing SLAM-based solutions. Our results demonstrate that it is possible to extract multi-task semantic information that is useful for navigation and mapping directly from bare-bone, non-semantic measurements. 
    more » « less
  2. We propose a method for autonomously learning an object-centric representation of a continuous and high-dimensional environment that is suitable for planning. Such representations can immediately be transferred between tasks that share the same types of objects, resulting in agents that require fewer samples to learn a model of a new task. We first demonstrate our approach on a 2D crafting domain consisting of numerous objects where the agent learns a compact, lifted representation that generalises across objects. We then apply it to a series of Minecraft tasks to learn object-centric representations and object types---directly from pixel data---that can be leveraged to solve new tasks quickly. The resulting learned representations enable the use of a task-level planner, resulting in an agent capable of transferring learned representations to form complex, long-term plans. 
    more » « less
  3. Abstract In this paper, we address the problem of autonomous multi-robot mapping, exploration and navigation in unknown, GPS-denied indoor or urban environments using a team of robots equipped with directional sensors with limited sensing capabilities and limited computational resources. The robots have no a priori knowledge of the environment and need to rapidly explore and construct a map in a distributed manner using existing landmarks, the presence of which can be detected using onboard senors, although little to no metric information (distance or bearing to the landmarks) is available. In order to correctly and effectively achieve this, the presence of a necessary density/distribution of landmarks is ensured by design of the urban/indoor environment. We thus address this problem in two phases: (1) During the design/construction of the urban/indoor environment we can ensure that sufficient landmarks are placed within the environment. To that end we develop afiltration-based approach for designing strategic placement of landmarks in an environment. (2) We develop a distributed algorithm which a team of robots, with no a priori knowledge of the environment, can use to explore such an environment, construct a topological map requiring no metric/distance information, and use that map to navigate within the environment. This is achieved using a topological representation of the environment (called aLandmark Complex), instead of constructing a complete metric/pixel map. The representation is built by the robot as well as used by them for navigation through a balanced strategy involving exploration and exploitation. We use tools from homology theory for identifying “holes” in the coverage/exploration of the unknown environment and hence guide the robots towards achieving a complete exploration and mapping of the environment. Our simulation results demonstrate the effectiveness of the proposed metric-free topological (simplicial complex) representation in achieving exploration, localization and navigation within the environment. 
    more » « less
  4. Abstract PurposeSpecialized robotic and surgical tools are increasing the complexity of operating rooms (ORs), requiring elaborate preparation especially when techniques or devices are to be used for the first time. Spatial planning can improve efficiency and identify procedural obstacles ahead of time, but real ORs offer little availability to optimize space utilization. Methods for creating reconstructions of physical setups, i.e., digital twins, are needed to enable immersive spatial planning of such complex environments in virtual reality. MethodsWe present a neural rendering-based method to create immersive digital twins of complex medical environments and devices from casual video capture that enables spatial planning of surgical scenarios. To evaluate our approach we recreate two operating rooms and ten objects through neural reconstruction, then conduct a user study with 21 graduate students carrying out planning tasks in the resulting virtual environment. We analyze task load, presence, perceived utility, plus exploration and interaction behavior compared to low visual complexity versions of the same environments. ResultsResults show significantly increased perceived utility and presence using the neural reconstruction-based environments, combined with higher perceived workload and exploratory behavior. There’s no significant difference in interactivity. ConclusionWe explore the feasibility of using modern reconstruction techniques to create digital twins of complex medical environments and objects. Without requiring expert knowledge or specialized hardware, users can create, explore and interact with objects in virtual environments. Results indicate benefits like high perceived utility while being technically approachable, which may indicate promise of this approach for spatial planning and beyond. 
    more » « less
  5. Representation learning in high-dimensional spaces faces significant robustness challenges with noisy inputs, particularly with heavy-tailed noise. Arguing that topological data analysis (TDA) offers a solution, we leverage TDA to enhance representation stability in neural networks. Our theoretical analysis establishes conditions under which incorporating topological summaries improves robustness to input noise, especially for heavy-tailed distributions. Extending these results to representation-balancing methods used in causal inference, we propose the *Topology-Aware Treatment Effect Estimation* (TATEE) framework, through which we demonstrate how topological awareness can lead to learning more robust representations. A key advantage of this approach is that it requires no ground-truth or validation data, making it suitable for observational settings common in causal inference. The method remains computationally efficient with overhead scaling linearly with data size while staying constant in input dimension. Through extensive experiments with -stable noise distributions, we validate our theoretical results, demonstrating that TATEE consistently outperforms existing methods across noise regimes. This work extends stability properties of topological summaries to representation learning via a tractable framework scalable for high-dimensional inputs, providing insights into how it can enhance robustness, with applications extending to domains facing challenges with noisy data, such as causal inference. 
    more » « less