NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Long-Form Answers to Visual Questions from Blind and Low Vision People

Huh, Mina; Xu, Fangyuan; Peng, Yi-Hao; Chen, Congyan; Murugu, Hansika; Gurari, Danna; Choi, Eunsol; Pavel, Amy (October 2024, Conference on Language Modeling)

Vision language models can now generate long-form answers to questions about images -- long-form visual question answers (LFVQA). We contribute VizWiz-LF, a dataset of long-form answers to visual questions posed by blind and low vision (BLV) users. VizWiz-LF contains 4.2k long-form answers to 600 visual questions, collected from human expert describers and six VQA models. We develop and annotate functional roles of sentences of LFVQA and demonstrate that long-form answers contain information beyond the question answer such as explanations and suggestions. We further conduct automatic and human evaluations with BLV and sighted people to evaluate long-form answers. BLV people perceive both human-written and generated long-form answers to be plausible, but generated answers often hallucinate incorrect visual details, especially for unanswerable visual questions (e.g., blurry or irrelevant images). To reduce hallucinations, we evaluate the ability of VQA models to abstain from answering unanswerable questions across multiple prompting strategies.
more » « less
Full Text Available
A Critical Evaluation of Evaluations for Long-form Question Answering

https://doi.org/10.18653/v1/2023.acl-long.181

Xu, Fangyuan; Song, Yixiao; Iyyer, Mohit; Choi, Eunsol (January 2023, Association for Computational Linguistics)

Full Text Available
How Do We Answer Complex Questions: Discourse Structure of Long-form Answers

https://doi.org/10.18653/v1/2022.acl-long.249

Xu, Fangyuan; Li, Junyi Jessy; Choi, Eunsol (January 2022, Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics)

Long-form answers, consisting of multiple sentences, can provide nuanced and comprehensive answers to a broader set of questions. To better understand this complex and understudied task, we study the functional structure of long-form answers collected from three datasets, ELI5, WebGPT and Natural Questions. Our main goal is to understand how humans organize information to craft complex answers. We develop an ontology of six sentence-level functional roles for long-form answers, and annotate 3.9k sentences in 640 answer paragraphs. Different answer collection methods manifest in different discourse structures. We further analyze model-generated answers – finding that annotators agree less with each other when annotating model-generated answers compared to annotating human-written answers. Our annotated data enables training a strong classifier that can be used for automatic analysis. We hope our work can inspire future research on discourse-level modeling and evaluation of long-form QA systems.
more » « less
Full Text Available

Search for: All records