Beyond Additive Fusion: Learning Non-Additive Multimodal Interactions

Wörtwein, Torsten; Sheeber, Lisa; Allen, Nicholas; Cohn, Jeffrey; Morency, Louis-Philippe

Citation Details

Multimodal fusion addresses the problem of analyzing spoken words in the multimodal context, including visual expressions and prosodic cues. Even when multimodal models lead to performance improvements, it is often unclear whether bimodal and trimodal interactions are learned or whether modalities are processed independently of each other. We propose Multimodal Residual Optimization (MRO) to separate unimodal, bimodal, and trimodal interactions in a multimodal model. This improves interpretability as the multimodal interaction can be quantified. Inspired by Occam’s razor, the main intuition of MRO is that (simpler) unimodal contributions should be learned before learning (more complex) bimodal and trimodal interactions. For example, bimodal predictions should learn to correct the mistakes (residuals) of unimodal predictions, thereby letting the bimodal predictions focus on the remaining bimodal interactions. Empirically, we observe that MRO successfully separates unimodal, bimodal, and trimodal interactions while not degrading predictive performance. We complement our empirical results with a human perception study and observe that MRO learns multimodal interactions that align with human judgments. more »

Award ID(s):: 1750439

PAR ID:: 10404844

Author(s) / Creator(s):: Wörtwein, Torsten; Sheeber, Lisa; Allen, Nicholas; Cohn, Jeffrey; Morency, Louis-Philippe

Date Published:: 2022-01-01

Journal Name:: Findings of the Association for Computational Linguistics: EMNLP 2022

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this