Self-supervised Regularization for Text Classification

Zhou, Meng; Li, Zechen; Xie, Pengtao

doi:10.1162/tacl_a_00389

Citation Details

Self-supervised Regularization for Text Classification

Abstract Text classification is a widely studied problem and has broad applications. In many real-world problems, the number of texts for training classification models is limited, which renders these models prone to overfitting. To address this problem, we propose SSL-Reg, a data-dependent regularization approach based on self-supervised learning (SSL). SSL (Devlin et al., 2019a) is an unsupervised learning approach that defines auxiliary tasks on input data without using any human-provided labels and learns data representations by solving these auxiliary tasks. In SSL-Reg, a supervised classification task and an unsupervised SSL task are performed simultaneously. The SSL task is unsupervised, which is defined purely on input texts without using any human- provided labels. Training a model using an SSL task can prevent the model from being overfitted to a limited number of class labels in the classification task. Experiments on 17 text classification datasets demonstrate the effectiveness of our proposed method. Code is available at https://github.com/UCSD-AI4H/SSReg. more »

Award ID(s):: 2120019

PAR ID:: 10345458

Author(s) / Creator(s):: Zhou, Meng; Li, Zechen; Xie, Pengtao

Date Published:: 2021-01-01

Journal Name:: Transactions of the Association for Computational Linguistics

Volume:: 9

ISSN:: 2307-387X

Page Range / eLocation ID:: 641 to 656

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Journal Article:
https://doi.org/10.1162/tacl_a_00389

More Like this