NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

What is Learned in Visually Grounded Neural Syntax Acquisition

https://doi.org/10.18653/v1/2020.acl-main.234

Kojima, Noriyuki; Averbuch-Elor, Hadar; Rush, Alexander; Artzi, Yoav (January 2020, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics)

Visual features are a promising signal for learning bootstrap textual models. However, blackbox learning models make it difficult to isolate the specific contribution of visual components. In this analysis, we consider the case study of the Visually Grounded Neural Syntax Learner (Shi et al., 2019), a recent approach for learning syntax from a visual training signal. By constructing simplified versions of the model, we isolate the core factors that yield the model’s strong performance. Contrary to what the model might be capable of learning, we find significantly less expressive versions produce similar predictions and perform just as well, or even better. We also find that a simple lexical signal of noun concreteness plays the main role in the model’s predictions as opposed to more complex syntactic reasoning.
more » « less
Full Text Available
A Corpus for Reasoning About Natural Language Grounded in Photographs

Suhr, Alane; Zhou, Stephanie; Zhang, Ally; Zhang, Iris; Bai, Huajun; Artzi, Yoav (January 2019, Proceedings of the Annual Meeting of the Association for Computational Linguistics)

We introduce a new dataset for joint reasoning about natural language and images, with a focus on semantic diversity, compositionality, and visual reasoning challenges. The data contains 107,292 examples of English sentences paired with web photographs. The task is to determine whether a natural language caption is true about a pair of photographs. We crowdsource the data using sets of visually rich images and a compare-and-contrast task to elicit linguistically diverse language. Qualitative analysis shows the data requires compositional joint reasoning, including about quantities, comparisons, and relations. Evaluation using state-of-the-art visual reasoning methods shows the data presents a strong challenge.
more » « less
Full Text Available
Learning to Map Context-Dependent Sentences to Executable Formal Queries

https://doi.org/10.18653/v1/N18-1203

Suhr, Alane; Iyer, Srinivasan; Artzi, Yoav (July 2018, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers))

We propose a context-dependent model to map utterances within an interaction to executable formal queries. To incorporate interaction history, the model maintains an interaction-level encoder that updates after each turn, and can copy sub-sequences of previously predicted queries during generation. Our approach combines implicit and explicit modeling of references between utterances. We evaluate our model on the ATIS flight planning interactions, and demonstrate the benefits of modeling context and explicit references.
more » « less
Full Text Available
Situated Mapping of Sequential Instructions to Actions with Single-step Reward Observation

Suhr, Alane; Artzi, Yoav (January 2018, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))

We propose a learning approach for mapping context-dependent sequential instructions to actions. We address the problem of discourse and state dependencies with an attention-based model that considers both the history of the interaction and the state of the world. To train from start and goal states without access to demonstrations, we propose SESTRA, a learning algorithm that takes advantage of single-step reward observations and immediate expected reward maximization. We evaluate on the SCONE domains, and show absolute accuracy improvements of 9.8%-25.3% across the domains over approaches that use high-level logical representations.
more » « less
Full Text Available

Search for: All records