A Joint Speaker-Listener-Reinforcer Model for Referring Expressions

Licheng Yu, Hao Tan

Citation Details

Referring expressions are natural language construc- tions used to identify particular objects within a scene. In this paper, we propose a unified framework for the tasks of referring expression comprehension and generation. Our model is composed of three modules: speaker, listener, and reinforcer. The speaker generates referring expressions, the listener comprehends referring expressions, and the rein- forcer introduces a reward function to guide sampling of more discriminative expressions. The listener-speaker mod- ules are trained jointly in an end-to-end learning frame- work, allowing the modules to be aware of one another during learning while also benefiting from the discrimina- tive reinforcer’s feedback. We demonstrate that this unified framework and training achieves state-of-the-art results for both comprehension and generation on three referring ex- pression datasets. more »

Award ID(s):: 1633295

PAR ID:: 10038499

Author(s) / Creator(s):: Licheng Yu, Hao Tan

Date Published:: 2017-01-01

Journal Name:: IEEE Conference on Computer Vision and Pattern Recognition

Volume:: 1

Issue:: 1

Page Range / eLocation ID:: 1

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this