Natural language generators for taskoriented
dialogue must effectively realize
system dialogue actions and their associated
semantics. In many applications,
it is also desirable for generators to control
the style of an utterance. To date,
work on task-oriented neural generation
has primarily focused on semantic fidelity
rather than achieving stylistic goals, while
work on style has been done in contexts
where it is difficult to measure content
preservation. Here we present three different
sequence-to-sequence models and
carefully test how well they disentangle
content and style. We use a statistical generator,
PERSONAGE, to synthesize a new
corpus of over 88,000 restaurant domain
utterances whose style varies according to
models of personality, giving us total control
over both the semantic content and the
stylistic variation in the training data. We
then vary the amount of explicit stylistic
supervision given to the three models. We
show that our most explicit model can simultaneously
achieve high fidelity to both
semantic and stylistic goals: this model
adds a context vector of 36 stylistic parameters
as input to the hidden state of the encoder
at each time step, showing the benefits
of explicit stylistic supervision, even
when the amount of training data is large.
more »
« less
Can Neural Generators for Dialogue Learn Sentence Planning and Discourse Structuring?
Responses in task-oriented dialogue systems
often realize multiple propositions
whose ultimate form depends on the use
of sentence planning and discourse structuring
operations. For example a recommendation
may consist of an explicitly
evaluative utterance e.g. Chanpen
Thai is the best option, along with content
related by the justification discourse
relation, e.g. It has great food and service,
that combines multiple propositions
into a single phrase. While neural generation
methods integrate sentence planning
and surface realization in one endto-end
learning framework, previous work
has not shown that neural generators can:
(1) perform common sentence planning
and discourse structuring operations; (2)
make decisions as to whether to realize
content in a single sentence or over multiple
sentences; (3) generalize sentence
planning and discourse relation operations
beyond what was seen in training. We
systematically create large training corpora
that exhibit particular sentence planning
operations and then test neural models
to see what they learn. We compare
models without explicit latent variables for
sentence planning with ones that provide
explicit supervision during training. We
show that only the models with additional
supervision can reproduce sentence planning
and discourse operations and generalize
to situations unseen in training.
more »
« less
- Award ID(s):
- 1748056
- PAR ID:
- 10079403
- Date Published:
- Journal Name:
- Proceedings of the 11th International Conference on Natural Language Generation
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract—Summarization of long sequences into a concise statement is a core problem in natural language processing, which requires a non-trivial understanding of the weakly structured text. Therefore, integrating crowdsourced multiple users’ comments into a concise summary is even harder because (1) it requires transferring the weakly structured comments to structured knowledge. Besides, (2) the users comments are informal and noisy. In order to capture the long-distance relationships in staggered long sentences, we propose a neural multi-comment summarization (MCS) system that incorporates the sentence relationships via graph heuristics that utilize relation knowledge graphs, i.e., sentence relation graphs (SRG) and approximate discourse graphs (ADG). Motivated by the promising results of gated graph neural networks (GG-NNs) on highly structured data, we develop a GG-NNs with sequence encoder that incorporates SRG or ADG in order to capture the sentence relationships. Specifically, we employ the GG-NNs on both relation knowledge graphs, with the sentence embeddings as the input node features and the graph heuristics as the edges’ weights. Through multiple layerwise propagations, the GG-NNs generate the salience for each sentence from high-level hidden sentence features. Consequently, we use a greedy heuristic to extract salient users’ comments while avoiding the noise in comments. The experimental results show that the proposed MCS improves the summarization performance both quantitatively and qualitatively.more » « less
-
null (Ed.)Several works have aimed to explain why overparameterized neural networks generalize well when trained by Stochastic Gradient Descent (SGD). The consensus explanation that has emerged credits the randomized nature of SGD for the bias of the training process towards low-complexity models and, thus, for implicit regularization. We take a careful look at this explanation in the context of image classification with common deep neural network architectures. We find that if we do not regularize explicitly, then SGD can be easily made to converge to poorly-generalizing, high-complexity models: all it takes is to first train on a random labeling on the data, before switching to properly training with the correct labels. In contrast, we find that in the presence of explicit regularization, pretraining with random labels has no detrimental effect on SGD. We believe that our results give evidence that explicit regularization plays a far more important role in the success of overparameterized neural networks than what has been understood until now. Specifically, by penalizing complicated models independently of their fit to the data, regularization affects training dynamics also far away from optima, making simple models that fit the data well discoverable by local methods, such as SGD.more » « less
-
Sentence specificity quantifies the level of detail in a sentence, characterizing the organization of information in discourse. While this information is useful for many downstream applications, specificity prediction systems predict very coarse labels (binary or ternary) and are trained on and tailored toward specific domains (e.g., news). The goal of this work is to generalize specificity prediction to domains where no labeled data is available and output more nuanced realvalued specificity ratings.We present an unsupervised domain adaptation system for sentence specificity prediction, specifically designed to output real-valued estimates from binary training labels. To calibrate the values of these predictions appropriately, we regularize the posterior distribution of the labels towards a reference distribution. We show that our framework generalizes well to three different domains with 50%-68% mean absolute error reduction than the current state-of-the-art system trained for news sentence specificity. We also demonstrate the potential of our work in improving the quality and informativeness of dialogue generation systems.more » « less
-
Relation extraction (RE) models have been challenged by their reliance on training data with expensive annotations. Considering that summarization tasks aim at acquiring concise expressions of synoptical information from the longer context, these tasks naturally align with the objective of RE, i.e., extracting a kind of synoptical information that describes the relation of entity mentions. We present SuRE, which converts RE into a summarization formulation. SuRE leads to more precise and resource-efficient RE based on indirect supervision from summarization tasks. To achieve this goal, we develop sentence and relation conversion techniques that essentially bridge the formulation of summarization and RE tasks. We also incorporate constraint decoding techniques with Trie scoring to further enhance summarization-based RE with robust inference. Experiments on three RE datasets demonstrate the effectiveness of SuRE in both full-dataset and low-resource settings, showing that summarization is a promising source of indirect supervision signals to improve RE models.more » « less