A dynamic predictive transformer with temporal relevance regression for action detection

Korban, Matthew; Youngs, Peter; Acton, Scott T

doi:10.1016/j.patcog.2025.111644

Citation Details

This content will become publicly available on October 1, 2026

A dynamic predictive transformer with temporal relevance regression for action detection

This paper introduces a novel transformer network tailored to skeleton-based action detection in untrimmed long video streams. Our approach centers around three innovative mechanisms that collectively enhance the network’s temporal analysis capabilities. First, a new predictive attention mechanism incorporates future frame data into the sequence analysis during the training phase. This mechanism addresses the essential issue of the current action detection models: incomplete temporal modeling in long action sequences, particularly for boundary frames that lie outside the network’s immediate temporal receptive field, while maintaining computational efficiency. Second, we integrate a new adaptive weighted temporal attention system that dynamically evaluates the importance of each frame within an action sequence. In contrast to the existing approaches, the proposed weighting strategy is both adaptive and interpretable, making it highly effective in handling long sequences with numerous non-informative frames. Third, the network incorporates an advanced regression technique. This approach independently identifies the start and end frames based on their relevance to different frames. Unlike existing homogeneous regression methods, the proposed regression method is heterogeneous and based on various temporal relationships, including those in future frames in actions, making it more effective for action detection. Extensive experiments on prominent untrimmed skeleton-based action datasets, PKU-MMD, OAD, and the Charade dataset demonstrate the effectiveness of this network. more »

Award ID(s):: 2322993 2000487

PAR ID:: 10627496

Author(s) / Creator(s):: Korban, Matthew; Youngs, Peter; Acton, Scott T

Publisher / Repository:: Elsevier

Date Published:: 2025-10-01

Journal Name:: Pattern Recognition

Volume:: 166

Issue:: C

ISSN:: 0031-3203

Page Range / eLocation ID:: 111644

Subject(s) / Keyword(s):: Action Detection, Transformer Network, Attention, Skeleton Pose, Regression

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on October 1, 2026
Journal Article:
https://doi.org/10.1016/j.patcog.2025.111644

More Like this