NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

UCTNet: Uncertainty-Aware Cross-Modal Transformer Network for Indoor RGB-D Semantic Segmentation

https://doi.org/10.1007/978-3-031-20056-4_2

Ying, Xiaowen; Chuah, Mooi Choo (November 2022, Springer)
Avidan, S. (Ed.)
In this paper, we tackle the problem of RGB-D Semantic Segmentation. The key challenges in solving this problem lie in 1) how to extract features from depth sensor data and 2) how to effectively fuse the features extracted from the two modalities. For the first challenge, we found that the depth information obtained from the sensor is not always reliable (e.g. objects with reflective or dark surfaces typically have inaccurate or void sensor readings), and existing methods that extract depth features using ConvNets did not explicitly consider the reliability of depth value at different pixel locations. To tackle this challenge, we propose a novel mechanism, namely Uncertainty-Aware Self-Attention that explicitly controls the information flow from unreliable depth pixels to confident depth pixels during feature extraction. For the second challenge, we propose an effective and scalable fusion module based on Cross-Attention that can adaptively fuse and exchange information between the RGB encoder and depth encoder. Our proposed framework, namely UCTNet, is an encoder-decoder network that naturally incorporates these two key designs for robust and accurate RGB-D Segmentation. Experimental results show that UCTNet outperforms existing works and achieves state-of-the-art performances on two RGB-D Semantic Segmentation benchmarks.
more » « less
Full Text Available
Robustness of Trajectory Prediction Models Under Map-Based Attacks

https://doi.org/10.1109/WACV56688.2023.00452

Zheng, Zhihao; Ying, Xiaowen; Yao, Zhen; Chuah, Mooi Choo (January 2023, IEEE)
Delving into Light-Dark Semantic Segmentation for Indoor Scenes Understanding

https://doi.org/10.1145/3552482.3556556

Ying, Xiaowen; Lang, Bo; Zheng, Zhihao; Chuah, Mooi Choo (October 2022, ACM)
SRNet: Spatial Relation Network for Efficient Single Stage Instance Segmentation in Videos

Ying, Xiaowen; Li, Xin; Chuah, Mooi Choo (October 2021, ACM Multimedia 2021)
null (Ed.)
The task of instance segmentation in videos aims to consistently identify objects at pixel level throughout the entire video sequence. Existing state-of-the-art methods either follow the tracking-bydetection paradigm to employ multi-stage pipelines or directly train a complex deep model to process the entire video clips as 3D volumes. However, these methods are typically slow and resourceconsuming such that they are often limited to offline processing. In this paper, we propose SRNet, a simple and efficient framework for joint segmentation and tracking of object instances in videos. The key to achieving both high efficiency and accuracy in our framework is to formulate the instance segmentation and tracking problem into a unified spatial-relation learning task where each pixel in the current frame relates to its object center, and each object center relates to its location in the previous frame. This unified learning framework allows our framework to perform join instance segmentation and tracking through a single stage while maintaining low overheads among different learning tasks. Our proposed framework can handle two different task settings and demonstrates comparable performance with state-of-the-art methods on two different benchmarks while running significantly faster.
more » « less
Full Text Available
Weakly-Supervised Object Representation Learning for Few-Shot Semantic Segmentation

Ying, Xiaowen; Li, Xin; Chuah, Mooi Choo (January 2021, IEEE Winter Conference on Applications of Computer Vision)
null (Ed.)
Training a semantic segmentation model requires large densely-annotated image datasets that are costly to obtain. Once the training is done, it is also difficult to add new ob- ject categories to such segmentation models. In this pa- per, we tackle the few-shot semantic segmentation prob- lem, which aims to perform image segmentation task on un- seen object categories merely based on one or a few sup- port example(s). The key to solving this few-shot segmen- tation problem lies in effectively utilizing object informa- tion from support examples to separate target objects from the background in a query image. While existing meth- ods typically generate object-level representations by av- eraging local features in support images, we demonstrate that such object representations are typically noisy and less distinguishing. To solve this problem, we design an ob- ject representation generator (ORG) module which can ef- fectively aggregate local object features from support im- age(s) and produce better object-level representation. The ORG module can be embedded into the network and trained end-to-end in a weakly-supervised fashion without extra hu- man annotation. We incorporate this design into a modified encoder-decoder network to present a powerful and efficient framework for few-shot semantic segmentation. Experimen- tal results on the Pascal-VOC and MS-COCO datasets show that our approach achieves better performance compared to existing methods under both one-shot and five-shot settings.
more » « less
Full Text Available
Weakly-Supervised Object Representation Learning for Few-Shot Semantic Segmentation

Ying, Xiaowen; Li, Xin; Chuah, Mooi Choo (January 2021, IEEE Winter Conference on Applications of Computer Vision)
null (Ed.)
Training a semantic segmentation model requires large densely-annotated image datasets that are costly to obtain. Once the training is done, it is also difficult to add new object categories to such segmentation models. In this paper, we tackle the few-shot semantic segmentation problem, which aims to perform image segmentation task on unseen object categories merely based on one or a few support example(s). The key to solving this few-shot segmentation problem lies in effectively utilizing object information from support examples to separate target objects from the background in a query image. While existing methods typically generate object-level representations by averaging local features in support images, we demonstrate that such object representations are typically noisy and less distinguishing. To solve this problem, we design an object representation generator (ORG) module which can effectively aggregate local object features from support image( s) and produce better object-level representation. The ORG module can be embedded into the network and trained end-to-end in a weakly-supervised fashion without extra human annotation. We incorporate this design into a modified encoder-decoder network to present a powerful and efficient framework for few-shot semantic segmentation. Experimental results on the Pascal-VOC and MS-COCO datasets show that our approach achieves better performance compared to existing methods under both one-shot and five-shot settings.
more » « less
Full Text Available
A Strawberry Detection System Using Convolutional Neural Networks

https://doi.org/10.1109/BigData.2018.8622466

Lamb, Nikolas; Chuah, Mooi Choo (December 2018, 5th National Symposium for NSF REU Research in Data Science, Systems, and Security)

In recent years, robotic technologies, e.g. drones or autonomous cars have been applied to the agricultural sectors to improve the efficiency of typical agricultural operations. Some agricultural tasks that are ideal for robotic automation are yield estimation and robotic harvesting. For these applications, an accurate and reliable image-based detection system is critically important. In this work, we present a low-cost strawberry detection system based on convolutional neural networks. Ablation studies are presented to validate the choice of hyper- parameters, framework, and network structure. Additional modifications to both the training data and network structure that improve precision and execution speed, e.g., input compression, image tiling, color masking, and network compression, are discussed. Finally, we present a final network implementation on a Raspberry Pi 3B that demonstrates a detection speed of 1.63 frames per second and an average precision of 0.842.
more » « less
Full Text Available
WiFi-Enabled Smart Human Dynamics Monitoring

https://doi.org/10.1145/3131672.3131692

Guo, Xiaonan; Liu, Bo; Shi, Cong; Liu, Hongbo; Chen, Yingying; Chuah, Mooi Choo (January 2017, ACM Conference on Embedded Network Sensor Systems)

The rapid pace of urbanization and socioeconomic development encourage people to spend more time together and therefore monitoring of human dynamics is of great importance, especially for facilities of elder care and involving multiple activities. Traditional approaches are limited due to their high deployment costs and privacy concerns (e.g., camera-based surveillance or sensor-attachment-based solutions). In this work, we propose to provide a fine-grained comprehensive view of human dynamics using existing WiFi infrastructures often available in many indoor venues. Our approach is low-cost and device-free, which does not require any active human participation. Our system aims to provide smart human dynamics monitoring through participant number estimation, human density estimation and walking speed and direction derivation. A semi-supervised learning approach leveraging the non-linear regression model is developed to significantly reduce training efforts and accommodate different monitoring environments. We further derive participant number and density estimation based on the statistical distribution of Channel State Information (CSI) measurements. In addition, people's walking speed and direction are estimated by using a frequency-based mechanism. Extensive experiments over 12 months demonstrate that our system can perform fine-grained effective human dynamic monitoring with over 90% accuracy in estimating participants number, density, and walking speed and direction at various indoor environments.
more » « less
Full Text Available

Search for: All records