VIVAR: learning view-invariant embedding for video action recognition

Hasan, Zahid; Ahmed, Masud; Faridee, Abu_Zaher Md; Purushotham, Sanjay; Lee, Hyungtae; Kwon, Heesung; Roy, Nirmalya

doi:10.1117/12.3059138

Citation Details

This content will become publicly available on March 10, 2026

VIVAR: learning view-invariant embedding for video action recognition

Deep learning has achieved state-of-the-art video action recognition (VAR) performance by comprehending action-related features from raw video. However, these models often learn to jointly encode auxiliary view (viewpoints and sensor properties) information with primary action features, leading to performance degradation under novel views and security concerns by revealing sensor types and locations. Here, we systematically study these shortcomings of VAR models and develop a novel approach, VIVAR, to learn view-invariant spatiotemporal action features removing view information. In particular, we leverage contrastive learning to separate actions and jointly optimize adversarial loss that aligns view distributions to remove auxiliary view information in the deep embedding space using the unlabeled synchronous multiview (MV) video to learn view-invariant VAR system. We evaluate VIVAR using our in-house large-scale time synchronous MV video dataset containing 10 actions with three angular viewpoints and sensors in diverse environments. VIVAR successfully captures view-invariant action features, improves inter and intra-action clusters’ quality, and outperforms SoTA models consistently with 8% more accuracy. We additionally perform extensive studies with our datasets, model architectures, multiple contrastive learning, and view distribution alignments to provide VIVAR insights. We open-source our code and dataset to facilitate further research in view-invariant systems. more »

Award ID(s):: 1750936

PAR ID:: 10587414

Author(s) / Creator(s):: Hasan, Zahid; Ahmed, Masud; Faridee, Abu_Zaher Md; Purushotham, Sanjay; Lee, Hyungtae; Kwon, Heesung; Roy, Nirmalya

Editor(s):: Liang, Xuefeng

Publisher / Repository:: SPIE

Date Published:: 2025-03-10

ISBN:: 9781510689237

Page Range / eLocation ID:: 11

Format(s):: Medium: X

Location:: Kuala Lumpur, Malaysia

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on March 10, 2026
Conference Paper:
https://doi.org/10.1117/12.3059138

More Like this