SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Jiang, Haoming; He, Pengcheng; Chen, Weizhu; Liu, Xiaodong; Gao, Jianfeng; Zhao, Tuo.

doi:10.18653/v1/2020.acl-main.197

Citation Details

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Transfer learning has fundamentally changed the landscape of natural language processing (NLP). Many state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. However, due to limited data resources from downstream tasks and the extremely high complexity of pre-trained models, aggressive fine-tuning of- ten causes the fine-tuned model to overfit the training data of downstream tasks and fail to generalize to unseen data. To address such an issue in a principled manner, we propose a new learning framework for robust and efficient fine-tuning for pre-trained models to attain better generalization performance. The pro- posed framework contains two important in- gredients: 1. Smoothness-inducing regulariza- tion, which effectively manages the complex- ity of the model; 2. Bregman proximal point optimization, which is an instance of trust- region methods and can prevent aggressive up- dating. Our experiments show that the pro- posed framework achieves new state-of-the-art performance on a number of NLP tasks includ- ing GLUE, SNLI, SciTail and ANLI. More- over, it also outperforms the state-of-the-art T5 model, which is the largest pre-trained model containing 11 billion parameters, on GLUE. more »

Award ID(s):: 1717916

PAR ID:: 10162625

Author(s) / Creator(s):: Jiang, Haoming; He, Pengcheng; Chen, Weizhu; Liu, Xiaodong; Gao, Jianfeng; Zhao, Tuo.

Date Published:: 2020-07-01

Journal Name:: Annual Meeting of the Association for Computational Linguistics

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2020.acl-main.197

More Like this