Automatic evaluation metrics are a crucial component of dialog systems research. Standard language evaluation metrics are known to be ineffective for evaluating dialog. As such, recent research has proposed a number of novel, dialog-specific metrics that correlate better with human judgements. Due to the fast pace of research, many of these metrics have been assessed on different datasets and there has as yet been no time for a systematic comparison between them. To this end, this paper provides a comprehensive assessment of recently proposed dialog evaluation metrics on a number of datasets. In this paper, 23 different automatic evaluation metrics are evaluated on 10 different datasets. Furthermore, the metrics are assessed in different settings, to better qualify their respective strengths and weaknesses. Metrics are assessed (1) on both the turn level and the dialog level, (2) for different dialog lengths, (3) for different dialog qualities (e.g., coherence, engaging), (4) for different types of response generation models (i.e., generative, retrieval, simple models and stateof-the-art models), (5) taking into account the similarity of different metrics and (6) exploring combinations of different metrics. This comprehensive assessment offers several takeaways pertaining to dialog evaluation metrics in general. It also suggests how to best assess evaluation metrics and indicates promising directions for future work.
more »
« less
Improving Computer Generated Dialog with Auxiliary Loss Functions and Custom Evaluation Metrics
Although people have the ability to en- gage in vapid dialogue without effort, this may not be a uniquely human trait. Since the 1960’s researchers have been trying to create agents that can generate artificial conversation. These programs are com- monly known as chatbots. With increasing use of neural networks for dialog genera- tion, some conclude that this goal has been achieved. This research joins the quest by creating a dialog generating Recurrent Neural Network (RNN) and by enhancing the ability of this network with auxiliary loss functions and a beam search. Our cus- tom loss functions achieve better cohesion and coherence by including calculations of Maximum Mutual Information (MMI) and entropy. We demonstrate the effectiveness of this system by using a set of custom evaluation metrics inspired by an abun- dance of previous research and based on tried-and-true principles of Natural Lan- guage Processing.
more »
« less
- Award ID(s):
- 1659788
- PAR ID:
- 10098862
- Date Published:
- Journal Name:
- International Conference on Natural Language Processing
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Deep neural networks have been increasingly used in real-world applications, making it critical to ensure their ability to adapt to new, unseen data. In this paper, we study the generalization capability of neural networks trained with (stochastic) gradient flow. We establish a new connection between the loss dynamics of gradient flow and general kernel machines by proposing a new kernel, called loss path kernel. This kernel measures the similarity between two data points by evaluating the agreement between loss gradients along the path determined by the gradient flow. Based on this connection, we derive a new generalization upper bound that applies to general neural network architectures. This new bound is tight and strongly correlated with the true generalization error. We apply our results to guide the design of neural architecture search (NAS) and demonstrate favorable performance compared with state-of-the-art NAS algorithms through numerical experiments.more » « less
-
Effectively integrating knowledge into end-to-end task-oriented dialog systems remains a challenge. It typically requires incorporation of an external knowledge base (KB) and capture of the intrinsic semantics of the dialog history. Recent research shows promising results by using Sequence-to-Sequence models, Memory Networks, and even Graph Convolutional Networks. However, current state-of-the-art models are less effective at integrating dialog history and KB into task-oriented dialog systems in the following ways: 1. The KB representation is not fully context-aware. The dynamic interaction between the dialog history and KB is seldom explored. 2. Both the sequential and structural information in the dialog history can contribute to capturing the dialog semantics, but they are not studied concurrently. In this paper, we propose a novel Graph Memory Network (GMN) based Seq2Seq model, GraphMemDialog, to effectively learn the inherent structural information hidden in dialog history, and to model the dynamic interaction between dialog history and KBs. We adopt a modified graph attention network to learn the rich structural representation of the dialog history, whereas the context-aware representation of KB entities are learnt by our novel GMN. To fully exploit this dynamic interaction, we design a learnable memory controller coupled with external KB entity memories to recurrently incorporate dialog history context into KB entities through a multi-hop reasoning mechanism. Experiments on three public datasets show that our GraphMemDialog model achieves state-of-the-art performance and outperforms strong baselines by a large margin, especially on datatests with more complicated KB information.more » « less
-
Significant advances have been made recently on training neural networks, where the main challenge is in solving an optimization problem with abundant critical points. However, existing approaches to address this issue crucially rely on a restrictive assumption: the training data is drawn from a Gaussian distribution. In this paper, we provide a novel unified framework to design loss functions with desirable landscape properties for a wide range of general input distributions. On these loss functions, remarkably, stochastic gradient descent theoretically recovers the true parameters with global initializations and empirically outperforms the existing approaches. Our loss function design bridges the notion of score functions with the topic of neural network optimization. Central to our approach is the task of estimating the score function from samples, which is of basic and independent interest to theoretical statistics. Traditional estimation methods (example: kernel based) fail right at the outset; we bring statistical methods of local likelihood to design a novel estimator of score functions, that provably adapts to the local geometry of the unknown density.more » « less
-
Dialog systems (e.g., chatbots) have been widely studied, yet related research that leverages artificial intelligence (AI) and natural language processing (NLP) is constantly evolving. These systems have typically been developed to interact with humans in the form of speech, visual, or text conversation. As humans continue to adopt dialog systems for various objectives, there is a need to involve humans in every facet of the dialog development life cycle for synergistic augmentation of both the humans and the dialog system actors in real-world settings. We provide a holistic literature survey on the recent advancements inhuman-centered dialog systems(HCDS). Specifically, we provide background context surrounding the recent advancements in machine learning-based dialog systems and human-centered AI. We then bridge the gap between the two AI sub-fields and organize the research works on HCDS under three major categories (i.e., Human-Chatbot Collaboration, Human-Chatbot Alignment, Human-Centered Chatbot Design & Governance). In addition, we discuss the applicability and accessibility of the HCDS implementations through benchmark datasets, application scenarios, and downstream NLP tasks.more » « less
An official website of the United States government

