Increasing Importance of Joint Analysis of Audio and Video in Computer Vision: A Survey

Shahabaz, Ahmed; Sarkar, Sudeep

doi:10.1109/ACCESS.2024.3391817

Citation Details

Increasing Importance of Joint Analysis of Audio and Video in Computer Vision: A Survey

The joint analysis of audio and video is a powerful tool that can be applied to various contexts, including action, speech, and sound recognition, audio-visual video parsing, emotion recognition in affective computing, and self-supervised training of deep learning models. Solving these problems often involves tackling core audio-visual tasks, such as audio-visual source localization, audio-visual correspondence, and audio-visual source separation, which can be combined in various ways to achieve the desired results. This paper provides a review of the literature in this area, discussing the advancements, history, and datasets of audio-visual learning methods for various application domains. It also presents an overview of the reported performances on standard datasets and suggests promising directions for future research. more »

Award ID(s):: 1956050

PAR ID:: 10545662

Author(s) / Creator(s):: Shahabaz, Ahmed; Sarkar, Sudeep

Publisher / Repository:: IEEE

Date Published:: 2024-01-01

Journal Name:: IEEE Access

Volume:: 12

ISSN:: 2169-3536

Page Range / eLocation ID:: 59399 to 59430

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1109/ACCESS.2024.3391817

More Like this