null
(Ed.)
Hierarchical multi-label text classification
(HMTC) aims to tag each document with a
set of classes from a class hierarchy. Most existing
HMTC methods train classifiers using
massive human-labeled documents, which are
often too costly to obtain in real-world applications.
In this paper, we explore to conduct
HMTC based on only class surface names as
supervision signals. We observe that to perform
HMTC, human experts typically first pinpoint
a few most essential classes for the document
as its “core classes”, and then check core
classes’ ancestor classes to ensure the coverage.
To mimic human experts, we propose a novel
HMTC framework, named TaxoClass. Specifically,
TaxoClass (1) calculates document-class
similarities using a textual entailment model,
(2) identifies a document’s core classes and utilizes
confident core classes to train a taxonomyenhanced
classifier, and (3) generalizes the
classifier via multi-label self-training. Our experiments
on two challenging datasets show
TaxoClass can achieve around 0.71 Example-
F1 using only class names, outperforming the
best previous method by 25%.
more »
« less