Combining Multiple Cues for Visual Madlibs Question Answering

Tommasi, Tatiana; Mallya, Arun; Plummer, Bryan; Lazebnik, Svetlana; Berg, Alexander C.; Berg, Tamara L.

doi:10.1007/s11263-018-1096-0

Citation Details

Combining Multiple Cues for Visual Madlibs Question Answering

This paper presents an approach for answering ﬁll-in-the-blank multiple choice questions from the Visual Madlibs dataset.Instead of generic and commonly used representations trained on the ImageNet classiﬁcation task, our approach employs acombination of networks trained for specialized tasks such as scene recognition, person activity classiﬁcation, and attributeprediction. We also present a method for localizing phrases from candidate answers in order to provide spatial support forfeature extraction. We map each of these features, together with candidate answers, to a joint embedding space throughnormalized canonical correlation analysis (nCCA). Finally, we solve an optimization problem to learn to combine scoresfrom nCCA models trained on multiple cues to select the best answer. Extensive experimental results show a signiﬁcantimprovement over the previous state of the art and conﬁrm that answering questions from a wide range of types beneﬁts fromexamining a variety of image cues and carefully choosing the spatial support for feature extraction. more »

Award ID(s):: 1633295

PAR ID:: 10066891

Author(s) / Creator(s):: Tommasi, Tatiana; Mallya, Arun; Plummer, Bryan; Lazebnik, Svetlana; Berg, Alexander C.; Berg, Tamara L.

Date Published:: 2018-04-18

Journal Name:: International Journal of Computer Vision

ISSN:: 0920-5691

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1007/s11263-018-1096-0

More Like this