UCTNet: Uncertainty-Aware Cross-Modal Transformer Network for Indoor RGB-D Semantic Segmentation

Ying, Xiaowen; Chuah, Mooi Choo

Citation Details

In this paper, we tackle the problem of RGB-D Semantic Segmentation. The key challenges in solving this problem lie in 1) how to extract features from depth sensor data and 2) how to effectively fuse the features extracted from the two modalities. For the first challenge, we found that the depth information obtained from the sensor is not always reliable (e.g. objects with reflective or dark surfaces typically have inaccurate or void sensor readings), and existing methods that extract depth features using ConvNets did not explicitly consider the reliability of depth value at different pixel locations. To tackle this challenge, we propose a novel mechanism, namely Uncertainty-Aware Self-Attention that explicitly controls the information flow from unreliable depth pixels to confident depth pixels during feature extraction. For the second challenge, we propose an effective and scalable fusion module based on Cross-Attention that can adaptively fuse and exchange information between the RGB encoder and depth encoder. Our proposed framework, namely UCTNet, is an encoder-decoder network that naturally incorporates these two key designs for robust and accurate RGB-D Segmentation. Experimental results show that UCTNet outperforms existing works and achieves state-of-the-art performances on two RGB-D Semantic Segmentation benchmarks. more »

Award ID(s):: 1931867

NSF-PAR ID:: 10467720

Author(s) / Creator(s):: Ying, Xiaowen; Chuah, Mooi Choo

Editor(s):: Avidan, S.

Publisher / Repository:: Springer

Date Published:: 2022-11-03

Volume:: 13690

Page Range / eLocation ID:: 20-37

Subject(s) / Keyword(s):: ["RGBD semantic segmentation","uncertainty aware","cross modal","transformer"]

Format(s):: Medium: X

Location:: Israel

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this