A multi-modal transformer network for action detection.

Korban, M.; Youngs, P.; Acton, S

doi:10.1016/j.patcog.2023.109713

Citation Details

A multi-modal transformer network for action detection.

This paper proposes a multi-modal transformer network for detecting actions in untrimmed videos. To enrich the action features, our transformer network utilizes a novel multi-modal attention mechanism that captures the correlations between different combinations of spa- tial and motion modalities. Exploring such correlations for actions effectively has not been explored before. We also suggest an algorithm to correct the motion distortion caused by camera movements. Such motion distortion severely reduces the expressive power of motion features represented by optical flow vectors. We also introduce a new instructional activity dataset that includes classroom videos from K-12 schools. We conduct comprehensive ex- periments to evaluate the performance of different approaches on our dataset. Our proposed algorithm outperforms the state-of-the-art methods on two public benchmarks, THUMOS14 and ActivityNet, and our instructional activity dataset. more »

Award ID(s):: 2000487

PAR ID:: 10448473

Author(s) / Creator(s):: Korban, M.; Youngs, P.; Acton, S

Editor(s):: Hancock, E.

Date Published:: 2023-10-01

Journal Name:: Pattern recognition

Volume:: 142

ISSN:: 0031-3203

Page Range / eLocation ID:: 1-28

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1016/j.patcog.2023.109713

More Like this