skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Deep Transfer Reinforcement Learning for Text Summarization
Deep neural networks are data hungry models and thus face difficulties when attempting to train on small text datasets. Transfer learning is a potential solution but their effectiveness in the text domain is not as explored as in areas such as image analysis. In this paper, we study the problem of transfer learning for text summarization and discuss why existing state-of-the-art models fail to generalize well on other (unseen) datasets. We propose a reinforcement learning framework based on a self-critic policy gradient approach which achieves good generalization and state-ofthe-art results on a variety of datasets. Through an extensive set of experiments, we also show the ability of our proposed framework to fine-tune the text summarization model using only a few training samples. To the best of our knowledge, this is the first work that studies transfer learning in text summarization and provides a generic solution that works well on unseen data  more » « less
Award ID(s):
1838730 1707498 1619028
PAR ID:
10143406
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of SIAM International Conference on Data Mining (SDM)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null; null; null; null (Ed.)
    Zero-shot learning (ZSL) addresses the unseen class recognition problem by leveraging semantic information to transfer knowledge from seen classes to unseen classes. Generative models synthesize the unseen visual features and convert ZSL into a classical supervised learning problem. These generative models are trained using the seen classes and are expected to implicitly transfer the knowledge from seen to unseen classes. However, their performance is stymied by overfitting, which leads to substandard performance on Generalized Zero-Shot learning (GZSL). To address this concern, we propose the novel LsrGAN, a generative model that Leverages the Semantic Relationship between seen and unseen categories and explicitly performs knowledge transfer by incorporating a novel Semantic Regularized Loss (SR-Loss). The SR-loss guides the LsrGAN to generate visual features that mirror the semantic relationships between seen and unseen classes. Experiments on seven benchmark datasets, including the challenging Wikipedia text-based CUB and NABirds splits, and Attribute-based AWA, CUB, and SUN, demonstrates the superiority of the LsrGAN compared to previous state-of-the-art approaches under both ZSL and GZSL. Code is available at https://github.com/Maunil/LsrGAN. 
    more » « less
  2. User representation learning is vital to capture diverse user preferences, while it is also challenging as user intents are latent and scattered among complex and different modalities of user-generated data, thus, not directly measurable. Inspired by the concept of user schema in social psychology, we take a new perspective to perform user representation learning by constructing a shared latent space to capture the dependency among different modalities of user-generated data. Both users and topics are embedded to the same space to encode users' social connections and text content, to facilitate joint modeling of different modalities, via a probabilistic generative framework. We evaluated the proposed solution on large collections of Yelp reviews and StackOverflow discussion posts, with their associated network structures. The proposed model outperformed several state-of-the-art topic modeling based user models with better predictive power in unseen documents, and state-of-the-art network embedding based user models with improved link prediction quality in unseen nodes. The learnt user representations are also proved to be useful in content recommendation, e.g., expert finding in StackOverflow. 
    more » « less
  3. Transfer learning has fundamentally changed the landscape of natural language processing (NLP). Many state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. However, due to limited data resources from downstream tasks and the extremely high complexity of pre-trained models, aggressive fine-tuning of- ten causes the fine-tuned model to overfit the training data of downstream tasks and fail to generalize to unseen data. To address such an issue in a principled manner, we propose a new learning framework for robust and efficient fine-tuning for pre-trained models to attain better generalization performance. The pro- posed framework contains two important in- gredients: 1. Smoothness-inducing regulariza- tion, which effectively manages the complex- ity of the model; 2. Bregman proximal point optimization, which is an instance of trust- region methods and can prevent aggressive up- dating. Our experiments show that the pro- posed framework achieves new state-of-the-art performance on a number of NLP tasks includ- ing GLUE, SNLI, SciTail and ANLI. More- over, it also outperforms the state-of-the-art T5 model, which is the largest pre-trained model containing 11 billion parameters, on GLUE. 
    more » « less
  4. Activity Recognition (AR) models perform well with a large number of available training instances. However, in the presence of sensor heterogeneity, sensing biasness and variability of human behaviors and activities and unseen activity classes pose key challenges to adopting and scaling these pre-trained activity recognition models in the new environment. These challenging unseen activities recognition problems are addressed by applying transfer learning techniques that leverage a limited number of annotated samples and utilize the inherent structural patterns among activities within and across the source and target domains. This work proposes a novel AR framework that uses the pre-trained deep autoencoder model and generates features from source and target activity samples. Furthermore, this AR frame-work establishes correlations among activities between the source and target domain by exploiting intra- and inter-class knowledge transfer to mitigate the number of labeled samples and recognize unseen activities in the target domain. We validated the efficacy and effectiveness of our AR framework with three real-world data traces (Daily and Sports, Opportunistic, and Wisdm) that contain 41 users and 26 activities in total. Our AR framework achieves performance gains ≈ 5-6% with 111, 18, and 70 activity samples (20 % annotated samples) for Das, Opp, and Wisdm datasets. In addition, our proposed AR framework requires 56, 8, and 35 fewer activity samples (10% fewer annotated examples) for Das, Opp, and Wisdm, respectively, compared to the state-of-the-art Untran model. 
    more » « less
  5. Abstract Text augmentation is an effective technique in alleviating overfitting in NLP tasks. In existing methods, text augmentation and downstream tasks are mostly performed separately. As a result, the augmented texts may not be optimal to train the downstream model. To address this problem, we propose a three-level optimization framework to perform text augmentation and the downstream task end-to- end. The augmentation model is trained in a way tailored to the downstream task. Our framework consists of three learning stages. A text summarization model is trained to perform data augmentation at the first stage. Each summarization example is associated with a weight to account for its domain difference with the text classification data. At the second stage, we use the model trained at the first stage to perform text augmentation and train a text classification model on the augmented texts. At the third stage, we evaluate the text classification model trained at the second stage and update weights of summarization examples by minimizing the validation loss. These three stages are performed end-to-end. We evaluate our method on several text classification datasets where the results demonstrate the effectiveness of our method. Code is available at https://github.com/Sai-Ashish/End-to-End-Text-Augmentation. 
    more » « less