MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

Gokhale, Tejas; Banerjee, Pratyay; Baral, Chitta; Yang, Yezhou

doi:10.18653/v1/2020.emnlp-main.63

Citation Details

MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering

While progress has been made on the visual question answering leaderboards, models often utilize spurious correlations and priors in datasets under the i.i.d. setting. As such, evaluation on out-of-distribution (OOD) test samples has emerged as a proxy for generalization. In this paper, we present \textit{MUTANT}, a training paradigm that exposes the model to perceptually similar, yet semantically distinct \textit{mutations} of the input, to improve OOD generalization, such as the VQA-CP challenge. Under this paradigm, models utilize a consistency-constrained training objective to understand the effect of semantic changes in input (question-image pair) on the output (answer). Unlike existing methods on VQA-CP, \textit{MUTANT} does not rely on the knowledge about the nature of train and test answer distributions. \textit{MUTANT} establishes a new state-of-the-art accuracy on VQA-CP with a 10.57{\%} improvement. Our work opens up avenues for the use of semantic input mutations for OOD generalization in question answering. more »

Award ID(s):: 1816039

PAR ID:: 10276935

Author(s) / Creator(s):: Gokhale, Tejas; Banerjee, Pratyay; Baral, Chitta; Yang, Yezhou

Date Published:: 2020-01-01

Journal Name:: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Page Range / eLocation ID:: 878 to 892

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2020.emnlp-main.63

More Like this