Actor-Centered Representations for Action Localization in Streaming Videos.

Aakur, S.; Sarkar. S.

doi:10.1007/978-3-031-19839-7_5

Citation Details

Actor-Centered Representations for Action Localization in Streaming Videos.

Event perception tasks such as recognizing and localizing actions in streaming videos are essential for scaling to real-world application contexts. We tackle the problem of learning actor-centered representations through the notion of continual hierarchical predictive learning to localize actions in streaming videos without the need for training labels and outlines for the objects in the video. We propose a framework driven by the notion of hierarchical predictive learning to construct actor-centered features by attention-based contextualization. The key idea is that predictable features or objects do not attract attention and hence do not contribute to the action of interest. Experiments on three benchmark datasets show that the approach can learn robust representations for localizing actions using only one epoch of training, i.e., a single pass through the streaming video. We show that the proposed approach outperforms unsupervised and weakly supervised baselines while offering competitive performance to fully supervised approaches. Additionally, we extend the model to multi-actor settings to recognize group activities while localizing the multiple, plausible actors. We also show that it generalizes to out-of-domain data with limited performance degradation. more »

Award ID(s):: 1956050

PAR ID:: 10433567

Author(s) / Creator(s):: Aakur, S.; Sarkar. S.

Editor(s):: Avidan, S.; Brostow, G.; Cissé, M; Farinella, G.M.; Hassner, T.

Date Published:: 2022-10-23

Journal Name:: European Conference on Computer Vision

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1007/978-3-031-19839-7_5

More Like this