NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers

Pillutla, Krishna; Swayamdipta, Swabha; Zellers, Rowan; Thickstun, John; Welleck, Sean; hoi, Yejin; Harchaoui, Zaid (January 2022, Advances in neural information processing systems)

As major progress is made in open-ended text generation, measuring how close machine-generated text is to human language remains a critical open problem. We introduce MAUVE, a comparison measure for open-ended text generation, which directly compares the learnt distribution from a text generation model to the distribution of human-written text using divergence frontiers. MAUVE scales up to modern text generation models by computing information divergences in a quantized embedding space. Through an extensive empirical study on three open-ended generation tasks, we find that MAUVE identifies known properties of generated text, scales naturally with model size, and correlates with human judgments, with fewer restrictions than existing distributional evaluation metrics.
more » « less
Full Text Available
TuringAdvice: A Generative and Dynamic Evaluation of Language Use

https://doi.org/10.18653/v1/2021.naacl-main.386

Zellers, Rowan; Holtzman, Ari; Clark, Elizabeth; Qin, Lianhui; Farhadi, Ali; Choi, Yejin (June 2021, Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume)

We propose TuringAdvice, a new challenge task and dataset for language understanding models. Given a written situation that a real person is currently facing, a model must generate helpful advice in natural language. Our evaluation framework tests a fundamental aspect of human language understanding: our ability to use language to resolve open-ended situations by communicating with each other. Empirical results show that today’s models struggle at TuringAdvice, even multibillion parameter models finetuned on 600k in-domain training examples. The best model, T5, writes advice that is at least as helpful as human-written advice in only 14% of cases; a much larger non-finetunable GPT3 model does even worse at 4%. This low performance reveals language understanding errors that are hard to spot outside of a generative setting, showing much room for progress.
more » « less
Full Text Available
Defending Against Neural Fake News

Zellers, Rowan; Holtzman, Ari; Rashkin, Hannah; Bisk, Yonatan; Farhadi, Ali; Roesner, Franziska; Choi, Yejin (December 2020, Neurips)

Recent progress in natural language generation has raised dual-use concerns. While applications like summarization and translation are positive, the underlying technology also might enable adversaries to generate neural fake news: targeted propaganda that closely mimics the style of real news. Modern computer security relies on careful threat modeling: identifying potential threats and vulnerabilities from an adversary's point of view, and exploring potential mitigations to these threats. Likewise, developing robust defenses against neural fake news requires us first to carefully investigate and characterize the risks of these models. We thus present a model for controllable text generation called Grover. Given a headline like `Link Found Between Vaccines and Autism,' Grover can generate the rest of the article; humans find these generations to be more trustworthy than human-written disinformation. Developing robust verification techniques against generators like Grover is critical. We find that best current discriminators can classify neural fake news from real, human-written, news with 73% accuracy, assuming access to a moderate level of training data. Counterintuitively, the best defense against Grover turns out to be Grover itself, with 92% accuracy, demonstrating the importance of public release of strong generators. We investigate these results further, showing that exposure bias -- and sampling strategies that alleviate its effects -- both leave artifacts that similar discriminators can pick up on. We conclude by discussing ethical issues regarding the technology, and plan to release Grover publicly, helping pave the way for better detection of neural fake news.
more » « less
Full Text Available
Edited Media Understanding: Reasoning About Implications of Manipulated Images

Da, Jeff; Forbes, Maxwell; Zellers, Rowan; Zheng, Anthony; Hwang, Jena D.; Bosselut, Antoine; Choi, Yejin (December 2020, Association for Computational Linguistics)

Multimodal disinformation, from `deepfakes' to simple edits that deceive, is an important societal problem. Yet at the same time, the vast majority of media edits are harmless -- such as a filtered vacation photo. The difference between this example, and harmful edits that spread disinformation, is one of intent. Recognizing and describing this intent is a major challenge for today's AI systems. We present the task of Edited Media Understanding, requiring models to answer open-ended questions that capture the intent and implications of an image edit. We introduce a dataset for our task, EMU, with 48k question-answer pairs written in rich natural language. We evaluate a wide variety of vision-and-language models for our task, and introduce a new model PELICAN, which builds upon recent progress in pretrained multimodal representations. Our model obtains promising results on our dataset, with humans rating its answers as accurate 40.35% of the time. At the same time, there is still much work to be done -- humans prefer human-annotated captions 93.56% of the time -- and we provide analysis that highlights areas for further progress.
more » « less
Full Text Available
Adversarial Filters of Dataset Biases

Le Bras, Ronan; Swayamdipta, Swabha; Bhagavatula, Chandra; Zellers, Rowan; Peters, Matthew E.; Sabharwal, Ashish; Choi, Yejin (April 2020, ICML)

Full Text Available

Search for: All records