Quantifying the visual concreteness of words and topics in multimodal datasets

Hessel, Jack; Lee, Lillian; Mimno, David

Citation Details

Multimodal machine learning algorithms aim to learn visual-textual correspondences. Previous work suggests that concepts with concrete visual manifestations may be easier to learn than concepts with abstract ones. We give an algorithm for automatically computing the visual concreteness of words and topics within multimodal datasets. We apply the approach in four settings, ranging from image captions to images/text scraped from historical books. In addition to enabling explorations of concepts in multi-modal datasets, our concreteness scores predict the capacity of machine learning algorithms to learn textual/visual relationships. We find that 1) concrete concepts are indeed easier to learn; 2) the large number of algorithms we consider have similar failure cases; 3) the precise positive relationship between concreteness and performance varies between datasets. We conclude with recommendations for using concreteness scores to facilitate future multi- modal research. more »

Award ID(s):: 1652536

PAR ID:: 10057831

Author(s) / Creator(s):: Hessel, Jack; Lee, Lillian; Mimno, David

Date Published:: 2018-01-01

Journal Name:: North American Association for Computational Linguistics

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this