Text Classification Using Label Names Only: A Language Model Self-Training Approach

Meng, Yu; Zhang, Yunyi; Huang, Jiaxin; Xiong, Chenyan; Ji, Heng; Zhang, Chao; Han, Jiawei

doi:10.18653/v1/2020.emnlp-main.724

Citation Details

Text Classification Using Label Names Only: A Language Model Self-Training Approach

Current text classification methods typically require a good number of human-labeled documents as training data, which can be costly and difficult to obtain in real applications. Hu-mans can perform classification without seeing any labeled examples but only based on a small set of words describing the categories to be classified. In this paper, we explore the potential of only using the label name of each class to train classification models on un-labeled data, without using any labeled documents. We use pre-trained neural language models both as general linguistic knowledge sources for category understanding and as representation learning models for document classification. Our method (1) associates semantically related words with the label names, (2) finds category-indicative words and trains the model to predict their implied categories, and (3) generalizes the model via self-training. We show that our model achieves around 90% ac-curacy on four benchmark datasets including topic and sentiment classification without using any labeled documents but learning from unlabeled data supervised by at most 3 words (1 in most cases) per class as the label name1. more »

Award ID(s):: 1956151 1741317 1704532

PAR ID:: 10279818

Author(s) / Creator(s):: Meng, Yu; Zhang, Yunyi; Huang, Jiaxin; Xiong, Chenyan; Ji, Heng; Zhang, Chao; Han, Jiawei

Date Published:: 2020-01-01

Journal Name:: EMNLP'20: 2020 Conf. on Empirical Methods in Natural Language Processing, Nov. 2020

Volume:: 2020

Issue:: 1

Page Range / eLocation ID:: 9006 to 9017

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2020.emnlp-main.724

More Like this