skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Real-time Semantic 3D Reconstruction for High-Touch Surface Recognition for Robotic Disinfection
Disinfection robots have applications in promoting public health and reducing hospital acquired infections and have drawn considerable interest due to the COVID-19 pandemic. To disinfect a room quickly, motion planning can be used to plan robot disinfection trajectories on a reconstructed 3D map of the room’s surfaces. However, existing approaches discard semantic information of the room and, thus, take a long time to perform thorough disinfection. Human cleaners, on the other hand, disinfect rooms more efficiently by prioritizing the cleaning of high-touch surfaces. To address this gap, we present a novel GPU-based volumetric semantic TSDF (Truncated Signed Distance Function) integration system for semantic 3D reconstruction. Our system produces 3D reconstructions that distinguish high-touch surfaces from non-high-touch surfaces at approximately 50 frames per second on a consumer-grade GPU, which is approximately 5 times faster than existing CPU-based TSDF semantic reconstruction methods. In addition, we extend a UV disinfection motion planning algorithm to incorporate semantic awareness for optimizing coverage of disinfection trajectories. Experiments show that our semantic-aware planning outperforms geometry-only planning by disinfecting up to 20% more high-touch surfaces under the same time budget. Further, the real-time nature of our semantic reconstruction pipeline enables future work on simultaneous disinfection and mapping. Code is available at: https://github.com/uiuc-iml/ RA-SLAM  more » « less
Award ID(s):
2025782
PAR ID:
10486599
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE/RSJ
Date Published:
Journal Name:
Proceedings of the IEEERSJ International Conference on Intelligent Robots and Systems
ISSN:
2153-0858
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Underwater perception and 3D surface reconstruction are challenging problems with broad applications in construction, security, marine archaeology, and environmental monitoring. Treacherous operating conditions, fragile surroundings, and limited navigation control often dictate that submersibles restrict their range of motion and, thus, the baseline over which they can capture measurements. In the context of 3D scene reconstruction, it is well-known that smaller baselines make reconstruction more challenging. Our work develops a physics-based multimodal acoustic-optical neural surface reconstruction framework (AONeuS) capable of effectively integrating high-resolution RGB measurements with low-resolution depth-resolved imaging sonar measurements. By fusing these complementary modalities, our framework can reconstruct accurate high-resolution 3D surfaces from measurements captured over heavily-restricted baselines. Through extensive simulations and in-lab experiments, we demonstrate that AONeuS dramatically outperforms recent RGB-only and sonar-only inverse-differentiable-rendering--based surface reconstruction methods. 
    more » « less
  2. Existing methods for pedestrian motion trajectory prediction are learning and predicting the trajectories in the 2D image space. In this work, we observe that it is much more efficient to learn and predict pedestrian trajectories in the 3D space since the human motion occurs in the 3D physical world and and their behavior patterns are better represented in the 3D space. To this end, we use a stereo camera system to detect and track the human pose with deep neural networks. During pose estimation, these twin deep neural networks satisfy the stereo consistence constraint. We adapt the existing SocialGAN method to perform pedestrian motion trajectory prediction from the 2D to the 3D space. Our extensive experimental results demonstrate that our proposed method significantly improves the pedestrian trajectory prediction performance, outperforming existing state-of-the-art methods. 
    more » « less
  3. We propose a framework for active next best view and touch selection for robotic manipulators using 3D Gaussian Splatting (3DGS). 3DGS is emerging as a useful explicit 3D scene representation for robotics, as it has the ability to represent scenes in a both photorealistic and geometrically accurate manner. However, in real-world, online robotic scenes where the number of views is limited given efficiency requirements, random view selection for 3DGS becomes impractical as views are often overlapping and redundant. We address this issue by proposing an end-to-end online training and active view selection pipeline, which enhances the performance of 3DGS in few-view robotics settings. We first elevate the performance of few-shot 3DGS with a novel semantic depth alignment method using Segment Anything Model 2 (SAM2) that we supplement with Pearson depth and surface normal loss to improve color and depth reconstruction of real-world scenes. We then extend FisherRF, a next-best-view selection method for 3DGS, to select views and touch poses based on depth uncertainty. We perform online view selection on a real robot system during live 3DGS training. We motivate our improvements to few-shot GS scenes, and extend depth-based FisherRF to them, where we demonstrate both qualitative and quantitative improvements on challenging robot scenes. For more information, please see our project page at arm.stanford.edu/next-best-sense. 
    more » « less
  4. Point cloud computation has become an increasingly more important workload thanks to its applications in autonomous driving. Unlike dense 2D computation, point cloud convolution has sparse and irregular computation patterns and thus requires dedicated inference system support with specialized high-performance kernels. While existing point cloud deep learning libraries have developed different dataflows for convolution on point clouds, they assume a single dataflow throughout the execution of the entire model. In this work, we systematically analyze and improve existing dataflows. Our resulting system, TorchSparse++, achieves 2.9x, 3.3x, 2.2x and 1.8x measured end-to-end speedup on an NVIDIA A100 GPU over the state-of-the-art MinkowskiEngine, SpConv 1.2, TorchSparse and SpConv v2 in inference respectively. Furthermore, TorchSparse++ is the only system to date that supports all necessary primitives for 3D segmentation, detection, and reconstruction workloads in autonomous driving. Code is publicly released at https://github.com/mit-han-lab/torchsparse. 
    more » « less
  5. Safety-guaranteed motion planning is critical for self-driving cars to generate collision-free trajectories. A layered motion planning approach with decoupled path and speed planning is widely used for this purpose. This approach is prone to be suboptimal in the presence of dynamic obstacles. Spatial-temporal approaches deal with path planning and speed planning simultaneously; however, the existing methods only support simple-shaped corridors like cuboids, which restrict the search space for optimization in complex scenarios. We propose to use trapezoidal prism-shaped corridors for optimization, which significantly enlarges the solution space compared to the existing cuboidal corridors-based method. Finally, a piecewise Bezier curve optimization is conducted in our proposed ´ corridors. This formulation theoretically guarantees the safety of the continuous-time trajectory. We validate the efficiency and effectiveness of the proposed approach in numerical and CommonRoad simulations 
    more » « less