Efficient Fusion of Computationally Diverse Modalities Using Chunking and Cross-Attention

Flores, Christian; Goncalves, Lucas; Busso, Carlos

doi:10.1109/ICASSP49660.2025.10890415

Citation Details

This content will become publicly available on April 6, 2026

Efficient Fusion of Computationally Diverse Modalities Using Chunking and Cross-Attention

Emotion recognition is inherently a multimodal problem. Humans use both audible and visual cues to determine a person’s emotions. There has been extensive improvement in the methods we use to fuse audio and visual representations between two unimodal deep-learning models. However, there is a lack of accommodation for modalities that have a disparity in the amount of computational resources needed to provide the same amount of temporal information. As the sequence length increases, current methods often make simplifications such as discarding frames or cropping the sequence. This paper introduces a chunking methodology designed for cross-attention-based multimodal transformer architectures. The approach involves segmenting the visual input—the more computationally demanding modality—into chunks. Cross-attention is then performed between the encoded audio and visual features instead of the original sequence lengths of the unimodal backbones. Our method achieves significant improvements over conventional cross-attention techniques in the audio-visual domain for a six-class emotional recognition problem, demonstrating better F1 score, precision, and recall on the CREMA-D database while reducing computational overhead. more »

Award ID(s):: 2016719

PAR ID:: 10655466

Author(s) / Creator(s):: Flores, Christian ; Goncalves, Lucas ; Busso, Carlos

Publisher / Repository:: IEEE

Date Published:: 2025-04-06

Page Range / eLocation ID:: 1 to 5

Format(s):: Medium: X

Location:: Hyderabad, India

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on April 6, 2026
Conference Paper:
https://doi.org/10.1109/ICASSP49660.2025.10890415

More Like this