skip to main content


Title: Deep Reinforcement Learning for Sequence-to-Sequence Models
In recent times, sequence-to-sequence (seq2seq) models have gained a lot of popularity and provide stateof-the-art performance in a wide variety of tasks, such as machine translation, headline generation, text summarization, speech-to-text conversion, and image caption generation. The underlying framework for all these models is usually a deep neural network comprising an encoder and a decoder. Although simple encoder–decoder models produce competitive results, many researchers have proposed additional improvements over these seq2seq models, e.g., using an attention-based model over the input, pointer-generation models, and self-attention models. However, such seq2seq models suffer from two common problems: 1) exposure bias and 2) inconsistency between train/test measurement. Recently, a completely novel point of view has emerged in addressing these two problems in seq2seq models, leveraging methods from reinforcement learning (RL). In this survey, we consider seq2seq problems from the RL point of view and provide a formulation combining the power of RL methods in decision-making with seq2seq models that enable remembering long-term memories. We present some of the most recent frameworks that combine the concepts from RL and deep neural networks. Our work aims to provide insights into some of the problems that inherently arise with current approaches and how we can address them with better RL models. We also provide the source code for implementing most of the RL models discussed in this paper to support the complex task of abstractive text summarization and provide some targeted experiments for these RL models, both in terms of performance and training time.  more » « less
Award ID(s):
1838730 1707498
NSF-PAR ID:
10142308
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
IEEE Transactions on Neural Networks and Learning Systems
ISSN:
2162-237X
Page Range / eLocation ID:
1 to 21
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. For decades, research in natural language processing (NLP) has focused on summarization. Sequence-to-sequence models for abstractive summarization have been studied extensively, yet generated summaries commonly suffer from fabricated content, and are often found to be near-extractive. We argue that, to address these issues, summarizers need to acquire the co-references that form multiple types of relations over input sentences, e.g., 1-to-N, N-to-1, and N-to-N relations, since the structured knowledge for text usually appears on these relations. By allowing the decoder to pay different attention to the input sentences for the same entity at different generation states, the structured graph representations generate more informative summaries. In this paper, we propose a hierarchical graph attention networks (HGATs) for abstractive summarization with a topicsensitive PageRank augmented graph. Specifically, we utilize dual decoders, a sequential sentence decoder, and a graph-structured decoder (which are built hierarchically) to maintain the global context and local characteristics of entities, complementing each other. We further design a greedy heuristic to extract salient users’ comments while avoiding redundancy to drive a model to better capture entity interactions. Our experimental results show that our models produce significantly higher ROUGE scores than variants without graph-based attention on both SSECIF and CNN/Daily Mail (CNN/DM) datasets. 
    more » « less
  2. In a world of proliferating data, the abil- ity to rapidly summarize text is grow- ing in importance. Automatic summariza- tion of text can be thought of as a se- quence to sequence problem. Another area of natural language processing that solves a sequence to sequence problem is ma- chine translation, which is rapidly evolv- ing due to the development of attention- based encoder-decoder networks. This work applies these modern techniques to abstractive summarization. We perform analysis on various attention mechanisms for summarization with the goal of devel- oping an approach and architecture aimed at improving the state of the art. In par- ticular, we modify and optimize a trans- lation model with self-attention for gener- ating abstractive sentence summaries. The effectiveness of this base model along with attention variants is compared and ana- lyzed in the context of standardized eval- uation sets and test metrics. However, we show that these metrics are limited in their ability to effectively score abstractive summaries, and propose a new approach based on the intuition that an abstractive model requires an abstractive evaluation. 
    more » « less
  3. The goal of text-to-text generation is to make machines express like a human in many applications such as conversation, summarization, and translation. It is one of the most important yet challenging tasks in natural language processing (NLP). Various neural encoder-decoder models have been proposed to achieve the goal by learning to map input text to output text. However, the input text alone often provides limited knowledge to generate the desired output, so the performance of text generation is still far from satisfaction in many real-world scenarios. To address this issue, researchers have considered incorporating (i) internal knowledge embedded in the input text and (ii) external knowledge from outside sources such as knowledge base and knowledge graph into the text generation system. This research topic is known as knowledge-enhanced text generation. In this survey, we present a comprehensive review of the research on this topic over the past five years. The main content includes two parts: (i) general methods and architectures for integrating knowledge into text generation; (ii) specific techniques and applications according to different forms of knowledge data. This survey can have broad audiences, researchers and practitioners, in academia and industry. 
    more » « less
  4. Relations between words are governed by hierarchical structure rather than linear ordering. Sequence-to-sequence (seq2seq) models, despite their success in downstream NLP applications, often fail to generalize in a hierarchy sensitive manner when performing syntactic transformations—for example, transforming declarative sentences into questions. However, syntactic evaluations of seq2seq models have only observed models that were not pre-trained on natural language data before being trained to perform syntactic transformations, in spite of the fact that pre-training has been found to induce hierarchical linguistic generalizations in language models; in other words, the syntactic capabilities of seq2seq models may have been greatly understated. We address this gap using the pre-trained seq2seq models T5 and BART, as well as their multilingual variants mT5 and mBART. We evaluate whether they generalize hierarchically on two transformations in two languages: question formation and passivization in English and German. We find that pre-trained seq2seq models generalize hierarchically when performing syntactic transformations, whereas models trained from scratch on syntactic transformations do not. This result presents evidence for the learnability of hierarchical syntactic information from non-annotated natural language text while also demonstrating that seq2seq models are capable of syntactic generalization, though only after exposure to much more language data than human learners receive. 
    more » « less
  5. High quality arguments are essential elements for human reasoning and decision-making processes. However, effective argument construction is a challenging task for both human and machines. In this work, we study a novel task on automatically generating arguments of a different stance for a given statement. We propose an encoder-decoder style neural network-based argument generation model enriched with externally retrieved evidence from Wikipedia. Our model first generates a set of talking point phrases as intermediate representation, followed by a separate decoder producing the final argument based on both input and the keyphrases. Experiments on a large-scale dataset collected from Reddit show that our model constructs arguments with more topic-relevant content than popular sequence-to-sequence generation models according to automatic evaluation and human assessments. 
    more » « less