From image to language and back again

BELZ, A.; BERG, T.L.; YU, L.

doi:10.1017/S1351324918000086

Citation Details

From image to language and back again

Work in computer vision and natural language processing involving images and text has been experiencing explosive growth over the past decade, with a particular boost coming from the neural network revolution. The present volume brings together five research articles from several different corners of the area: multilingual multimodal image description (Frank et al. ), multimodal machine translation (Madhyastha et al. , Frank et al. ), image caption generation (Madhyastha et al. , Tanti et al. ), visual scene understanding (Silberer et al. ), and multimodal learning of high-level attributes (Sorodoc et al. ). In this article, we touch upon all of these topics as we review work involving images and text under the three main headings of image description (Section 2), visually grounded referring expression generation (REG) and comprehension (Section 3), and visual question answering (VQA) (Section 4). more »

Award ID(s):: 1633295

PAR ID:: 10066888

Author(s) / Creator(s):: BELZ, A.; BERG, T.L.; YU, L.

Date Published:: 2018-05-01

Journal Name:: Natural Language Engineering

Volume:: 24

Issue:: 03

ISSN:: 1351-3249

Page Range / eLocation ID:: 325 to 362

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1017/S1351324918000086

More Like this