Self-Supervised Audio-Visual Representation Learning for in-the-wild Videos

Feng, Z; Tu, M; Xia, R; Wang, Y; Krishnamurthy, A

Citation Details

Humans understand videos from both the visual and audio aspects of the data. In this work, we present a self supervised cross modal representation approach for learning audio visual correspondence (AVC) for videos in the wild. After the learning stage, we explore retrieval in both cross modal and intra modal manner with the learned representations. We verify our experimental results on the VGGSound dataset and our approach achieves promising results. more »

Award ID(s):: 1633295

PAR ID:: 10212652

Author(s) / Creator(s):: Feng, Z; Tu, M; Xia, R; Wang, Y; Krishnamurthy, A

Date Published:: 2020-12-01

Journal Name:: IEEE International Conference on Big Data

ISSN:: 2639-1589

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
The DOI is not currently available.

More Like this