skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Linguistically-Informed Specificity and Semantic Plausibility for Dialogue Generation
Sequence-to-sequence models for open-domain dialogue generation tend to favor generic, uninformative responses. Past work has focused on word frequency-based approaches to improving specificity, such as penalizing responses with only common words. In this work, we examine whether specificity is solely a frequency-related notion and find that more linguistically-driven specificity measures are better suited to improving response informativeness. However, we find that forcing a sequence-to-sequence model to be more specific can expose a host of other problems in the responses, including flawed discourse and implausible semantics. We rerank our model{'}s outputs using externally-trained classifiers targeting each of these identified factors. Experiments show that our final model using linguistically motivated specificity and plausibility reranking improves the informativeness, reasonableness, and grammatically of responses.  more » « less
Award ID(s):
1850153
PAR ID:
10165657
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 2019 Conference of the North {A}merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
Page Range / eLocation ID:
3456 to 3466
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Sentence specificity quantifies the level of detail in a sentence, characterizing the organization of information in discourse. While this information is useful for many downstream applications, specificity prediction systems predict very coarse labels (binary or ternary) and are trained on and tailored toward specific domains (e.g., news). The goal of this work is to generalize specificity prediction to domains where no labeled data is available and output more nuanced realvalued specificity ratings.We present an unsupervised domain adaptation system for sentence specificity prediction, specifically designed to output real-valued estimates from binary training labels. To calibrate the values of these predictions appropriately, we regularize the posterior distribution of the labels towards a reference distribution. We show that our framework generalizes well to three different domains with 50%-68% mean absolute error reduction than the current state-of-the-art system trained for news sentence specificity. We also demonstrate the potential of our work in improving the quality and informativeness of dialogue generation systems. 
    more » « less
  2. Understanding memory in the context of data visualizations is paramount for effective design. While immediate clarity in a visualization is crucial, retention of its information determines its long‐term impact. While extensive research has underscored the elements enhancing visualization memorability, a limited body of work has delved into modeling the recall process. This study investigates the temporal dynamics of visualization recall, focusing on factors influencing recollection, shifts in recall veracity, and the role of participant demographics. Using data from an empirical study (n = 104), we propose a novel approach combining temporal clustering and handcrafted features to model recall over time. A long short‐term memory (LSTM) model with attention mechanisms predicts recall patterns, revealing alignment with informativeness scores and participant characteristics. Our findings show that perceived informativeness dictates recall focus, with more informative visualizations eliciting narrative‐driven insights and less informative ones prompting aesthetic‐driven responses. Recall accuracy diminishes over time, particularly for unfamiliar visualizations, with age and education significantly shaping recall emphases. These insights advance our understanding of visualization recall, offering practical guidance for designing visualizations that enhance retention and comprehension. All data and materials are available at: https://osf.io/ghe2j/'>https://osf.io/ghe2j/. 
    more » « less
  3. In this paper, we propose a deep learning approach to tackle the automatic summarization tasks by incorporating topic information into the convolutional sequence-to-sequence (ConvS2S) model and using self-critical sequence training (SCST) for optimization. Through jointly attending to topics and word-level alignment, our approach can improve coherence, diversity, and informativeness of generated summaries via a biased probability generation mechanism. On the other hand, reinforcement training, like SCST, directly optimizes the proposed model with respect to the non-differentiable metric ROUGE, which also avoids the exposure bias during inference. We carry out the experimental evaluation with state-of-the-art methods over the Gigaword, DUC-2004, and LCSTS datasets. The empirical results demonstrate the superiority of our proposed method in the abstractive summarization. 
    more » « less
  4. null (Ed.)
    Text Simplification improves the readability of sentences through several rewriting transformations, such as lexical paraphrasing, deletion, and splitting. Current simplification systems are predominantly sequence-to-sequence models that are trained end-to-end to perform all these operations simultaneously. However, such systems limit themselves to mostly deleting words and cannot easily adapt to the requirements of different target audiences. In this paper, we propose a novel hybrid approach that leverages linguistically-motivated rules for splitting and deletion, and couples them with a neural paraphrasing model to produce varied rewriting styles. We introduce a new data augmentation method to improve the paraphrasing capability of our model. Through automatic and manual evaluations, we show that our proposed model establishes a new state-of-the-art for the task, paraphrasing more often than the existing systems, and can control the degree of each simplification operation applied to the input texts. 
    more » « less
  5. Large language models (LLMs) that do not give consistent answers across contexts are problematic when used for tasks with expectations of consistency–e.g. question-answering, explanations, etc. Our work presents an evaluation benchmark for self-consistency in cases of under-specification where two or more answers can be correct. We conduct a series of behavioral experiments on the OpenAI model suite using an ambiguous integer sequence completion task. We find that average consistency ranges from 67% to 82%, far higher than would be predicted if a model’s consistency was random, and increases as model capability improves. Furthermore, we show that models tend to maintain self-consistency across a series of robustness checks, including prompting speaker changes and sequence length changes. These results suggest that self- consistency arises as an emergent capability without specifically training for it. Despite this, we find that models are uncalibrated when judging their own consistency, with models display- ing both over- and under-confidence. We also propose a nonparametric test for determining from token output distribution whether a model assigns non-trivial probability to alternative answers. Using this test, we find that despite increases in self-consistency, models usually place significant weight on alternative, inconsistent answers. This distribution of probability mass provides evidence that even highly self- consistent models internally compute multiple possible responses. 
    more » « less