NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer

https://doi.org/10.1145/3664647.3681009

Wu, Wenhan; Zheng, Ce; Yang, Zihao; Chen, Chen; Das, Srijan; Lu, Aidong (October 2024, ACM)

Full Text Available
Frequency Guidance Matters: Skeletal Action Recognition by Frequency-Aware Mixed Transformer

Wu, Wenhan; Zheng, Ce; Yang, Zihao; Chen, Chen; Das, Srijan; Lu, Aidong (July 2024, ACM Multimedia)

Recently, transformers have demonstrated great potential for modeling long-term dependencies from skeleton sequences and thereby gained ever-increasing attention in skeleton action recognition. However, the existing transformer-based approaches heavily rely on the naive attention mechanism for capturing the spatiotemporal features, which falls short in learning discriminative representations that exhibit similar motion patterns. To address this challenge, we introduce the Frequency-aware Mixed Transformer (FreqMixFormer), specifically designed for recognizing similar skeletal actions with subtle discriminative motions. First, we introduce a frequency-aware attention module to unweave skeleton frequency representations by embedding joint features into frequency attention maps, aiming to distinguish the discriminative movements based on their frequency coefficients. Subsequently, we develop a mixed transformer architecture to incorporate spatial features with frequency features to model the comprehensive frequency-spatial patterns. Additionally, a temporal transformer is proposed to extract the global correlations across frames. Extensive experiments show that FreqMiXFormer outperforms SOTA on 3 popular skeleton action recognition datasets, including NTU RGB+D, NTU RGB+D120, and NW-UCLA datasets. Our project is publicly available at: https://github.com/wenhanwu95/FreqMixFormer.
more » « less
Full Text Available
Multi-View Attentive Contextualization for Multi-View 3D Object Detection

Liu, Xianpeng; Zheng, Ce; Qian, Ming; Xue, Nan; Chen, Chen; Zhang, Zhebin; Li, Chen; Wu, Tianfu (August 2024, IEEE CVPR)

This paper presents Multi-View Attentive Contextualization (MvACon), a simple yet effective method for improving 2D- to-3D feature lifting in query-based multi-view 3D (MV3D) object detection. Despite remarkable progress witnessed in the field of query-based MV3D object detection, prior art often suffers from either the lack of exploiting high- resolution 2D features in dense attention-based lifting, due to high computational costs, or from insufficiently dense grounding of 3D queries to multi-scale 2D features in sparse attention-based lifting. Our proposed MvACon hits the two birds with one stone using a representationally dense yet computationally sparse attentive feature contextualization scheme that is agnostic to specific 2D-to-3D feature lifting approaches. In experiments, the proposed MvACon is thoroughly tested on the nuScenes benchmark, using both the BEVFormer and its recent 3D deformable attention (DFA3D) variant, as well as the PETR, showing consistent detection performance improvement, especially in enhancing performance in location, orientation, and velocity prediction. It is also tested on the Waymo-mini benchmark using BEVFormer with similar improvement. We qualitatively and quantitatively show that global cluster-based contexts effectively encode dense scene-level contexts for MV3D object detection. The promising results of our proposed MvACon reinforces the adage in computer vision – “(contextualized) feature matters”.
more » « less
Full Text Available
Monocular 3D Object Detection with Bounding Box Denoising in 3D by Perceiver

Liu, Xianpeng; Zheng, Ce; Cheng, Kelvin; Xue, Nan; Qi, Goo-Jun; Wu, Tianfu (October 2023, The Computer Vision Foundation (CVF))

The main challenge of monocular 3D object detection is the accurate localization of 3D center. Motivated by a new and strong observation that this challenge can be remedied by a 3D-space local-grid search scheme in an ideal case, we propose a stage-wise approach, which combines the information flow from 2D-to-3D (3D bounding box proposal generation with a single 2D image) and 3D-to-2D (proposal verification by denoising with 3D-to-2D contexts) in a topdown manner. Specifically, we first obtain initial proposals from off-the-shelf backbone monocular 3D detectors. Then, we generate a 3D anchor space by local-grid sampling from the initial proposals. Finally, we perform 3D bounding box denoising at the 3D-to-2D proposal verification stage. To effectively learn discriminative features for denoising highly overlapped proposals, this paper presents a method of using the Perceiver I/O model [20] to fuse the 3D-to-2D geometric information and the 2D appearance information. With the encoded latent representation of a proposal, the verification head is implemented with a self-attention module. Our method, named as MonoXiver, is generic and can be easily adapted to any backbone monocular 3D detectors. Experimental results on the well-established KITTI dataset and the challenging large-scale Waymo dataset show that MonoXiver consistently achieves improvement with limited computation overhead.
more » « less
Full Text Available
A Lightweight Graph Transformer Network for Human Mesh Reconstruction from 2D Human Pose

https://doi.org/10.1145/3503161.3547844

Zheng, Ce; Mendieta, Matias; Wang, Pu; Lu, Aidong; Chen, Chen (October 2022, ACM)

Full Text Available
3D Human Pose Estimation with Spatial and Temporal Transformers

https://doi.org/10.1109/ICCV48922.2021.01145

Zheng, Ce; Zhu, Sijie; Mendieta, Matias; Yang, Taojiannan; Chen, Chen; Ding, Zhengming (October 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV))

Full Text Available
Exploiting Multi-view Part-wise Correlation via an Efficient Transformer for Vehicle Re-Identification

https://doi.org/10.1109/TMM.2021.3134839

Li, Ming; Liu, Jun; Zheng, Ce; Huang, Xinming; Zhang, Ziming (January 2021, IEEE Transactions on Multimedia)

Full Text Available
In Situ Micropillar Compression Tests of 304 Stainless Steels After Ion Irradiation and Helium Implantation

https://doi.org/10.1007/s11837-020-04127-2

Schoell, Ryan; Frazer, David; Zheng, Ce; Hosemann, Peter; Kaoumi, Djamel (July 2020, JOM)

Full Text Available

Search for: All records