Temporal Feature Enhancement Dilated Convolution Network for Weakly-supervised Temporal Action Localization

Zhou, Jianxiong; Wu, Ying

doi:10.1109/WACV56688.2023.00597

Citation Details

Temporal Feature Enhancement Dilated Convolution Network for Weakly-supervised Temporal Action Localization

Weakly-supervised Temporal Action Localization (WTAL) aims to classify and localize action instances in untrimmed videos with only video-level labels. Existing methods typically use snippet-level RGB and optical flow features extracted from pre-trained extractors directly. Because of two limitations: the short temporal span of snippets and the inappropriate initial features, these WTAL methods suffer from the lack of effective use of temporal information and have limited performance. In this paper, we propose the Temporal Feature Enhancement Dilated Convolution Network (TFE-DCN) to address these two limitations. The proposed TFE-DCN has an enlarged receptive field that covers a long temporal span to observe the full dynamics of action instances, which makes it powerful to capture temporal dependencies between snippets. Furthermore, we propose the Modality Enhancement Module that can enhance RGB features with the help of enhanced optical flow features, making the overall features appropriate for the WTAL task. Experiments conducted on THUMOS’14 and ActivityNet v1.3 datasets show that our proposed approach far outperforms state-of-the-art WTAL methods. more »

Award ID(s):: 1815561 2007613

PAR ID:: 10464214

Author(s) / Creator(s):: Zhou, Jianxiong; Wu, Ying

Date Published:: 2023-01-01

Journal Name:: IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)

Page Range / eLocation ID:: 6017 to 6026

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/WACV56688.2023.00597

More Like this