NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting

Sclar, Melanie; Choi, Yejin; Tsvetkov, Yulia; Suhr, Alane (May 2024, International Conference on Learning Representations)

As large language models (LLMs) are adopted as a fundamental component of language technologies, it is crucial to accurately characterize their performance. Because choices in prompt design can strongly influence model behavior, this design process is critical in effectively using any modern pre-trained generative language model. In this work, we focus on LLM sensitivity to a quintessential class of meaning-preserving design choices: prompt formatting. We find that several widely used open-source LLMs are extremely sensitive to subtle changes in prompt formatting in few-shot settings, with performance differences of up to 76 accuracy points when evaluated using LLaMA-2-13B. Sensitivity remains even when increasing model size, the number of few-shot examples, or performing instruction tuning. Our analysis suggests that work evaluating LLMs with prompting-based methods would benefit from reporting a range of performance across plausible prompt formats, instead of the currently-standard practice of reporting performance on a single format. We also show that format performance only weakly correlates between models, which puts into question the methodological validity of comparing models with an arbitrarily chosen, fixed prompt format. To facilitate systematic analysis we propose FormatSpread, an algorithm that rapidly evaluates a sampled set of plausible prompt formats for a given task, and reports the interval of expected performance without accessing model weights. Furthermore, we present a suite of analyses that characterize the nature of this sensitivity, including exploring the influence of particular atomic perturbations and the internal representation of particular formats.
more » « less
Full Text Available
Continual Learning for Instruction Following from Realtime Feedback

Suhr, Alane; Artzi, Yoav (December 2023, Neural Information Processing Systems)
What is in my big data?

Elazar, Yanai; Bhagia, Akshita; Magnusson, Ian; Ravichander, Abhilasha; Schwenk, Dustin; Suhr, Alane; Walsh, Pete; Groeneveld, Dirk; Soldaini, Luca; Singh, Sameer; et al (May 2024, ICLR)

Full Text Available
Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior

https://doi.org/10.1162/tacl_a_00428

Kojima, Noriyuki; Suhr, Alane; Artzi, Yoav (January 2021, Transactions of the Association for Computational Linguistics)

Abstract We study continual learning for natural language instruction generation, by observing human users’ instruction execution. We focus on a collaborative scenario, where the system both acts and delegates tasks to human users using natural language. We compare user execution of generated instructions to the original system intent as an indication to the system’s success communicating its intent. We show how to use this signal to improve the system’s ability to generate instructions via contextual bandit learning. In interaction with real users, our system demonstrates dramatic improvements in its ability to generate language over time.
more » « less
Full Text Available
Analysis of Language Change in Collaborative Instruction Following

https://doi.org/10.18653/v1/2021.findings-emnlp.239

Effenberger, Anna; Singh, Rhia; Yan, Eva; Suhr, Alane; Artzi, Yoav (January 2021, Findings of the Association for Computational Linguistics: EMNLP 2021)

Full Text Available
TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

https://doi.org/10.1109/CVPR.2019.01282

Chen, Howard; Suhr, Alane; Misra, Dipendra; Snavely, Noah; Artzi, Yoav (June 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR))

Full Text Available
Learning to Map Context-Dependent Sentences to Executable Formal Queries

https://doi.org/10.18653/v1/N18-1203

Suhr, Alane; Iyer, Srinivasan; Artzi, Yoav (July 2018, Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers))

We propose a context-dependent model to map utterances within an interaction to executable formal queries. To incorporate interaction history, the model maintains an interaction-level encoder that updates after each turn, and can copy sub-sequences of previously predicted queries during generation. Our approach combines implicit and explicit modeling of references between utterances. We evaluate our model on the ATIS flight planning interactions, and demonstrate the benefits of modeling context and explicit references.
more » « less
Full Text Available
A Corpus for Reasoning About Natural Language Grounded in Photographs

Suhr, Alane; Zhou, Stephanie; Zhang, Ally; Zhang, Iris; Bai, Huajun; Artzi, Yoav (January 2019, Proceedings of the Annual Meeting of the Association for Computational Linguistics)

We introduce a new dataset for joint reasoning about natural language and images, with a focus on semantic diversity, compositionality, and visual reasoning challenges. The data contains 107,292 examples of English sentences paired with web photographs. The task is to determine whether a natural language caption is true about a pair of photographs. We crowdsource the data using sets of visually rich images and a compare-and-contrast task to elicit linguistically diverse language. Qualitative analysis shows the data requires compositional joint reasoning, including about quantities, comparisons, and relations. Evaluation using state-of-the-art visual reasoning methods shows the data presents a strong challenge.
more » « less
Full Text Available
Situated Mapping of Sequential Instructions to Actions with Single-step Reward Observation

Suhr, Alane; Artzi, Yoav (January 2018, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers))

We propose a learning approach for mapping context-dependent sequential instructions to actions. We address the problem of discourse and state dependencies with an attention-based model that considers both the history of the interaction and the state of the world. To train from start and goal states without access to demonstrations, we propose SESTRA, a learning algorithm that takes advantage of single-step reward observations and immediate expected reward maximization. We evaluate on the SCONE domains, and show absolute accuracy improvements of 9.8%-25.3% across the domains over approaches that use high-level logical representations.
more » « less
Full Text Available
Executing Instructions in Situated Collaborative Interactions

https://doi.org/10.18653/v1/D19-1218

Suhr, Alane; Yan, Claudia; Schluger, Jack; Yu, Stanley; Khader, Hadi; Mouallem, Marwa; Zhang, Iris; Artzi, Yoav (January 2019, Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP))

Full Text Available

Search for: All records