NetTaxo: Automated Topic Taxonomy Construction from Text-Rich Network

Shang, Jingbo; Zhang, Xinyang; Liu, Liyuan; Li, Sha; Han, Jiawei

doi:10.1145/3366423.3380259

Citation Details

NetTaxo: Automated Topic Taxonomy Construction from Text-Rich Network

The automated construction of topic taxonomies can benefit numerous applications, including web search, recommendation, and knowledge discovery. One of the major advantages of automatic taxonomy construction is the ability to capture corpus-specific information and adapt to different scenarios. To better reflect the characteristics of a corpus, we take the meta-data of documents into consideration and view the corpus as a text-rich network. In this paper, we propose NetTaxo, a novel automatic topic taxonomy construction framework, which goes beyond the existing paradigm and allows text data to collaborate with network structure. Specifically, we learn term embeddings from both text and network as contexts. Network motifs are adopted to capture appropriate network contexts. We conduct an instance-level selection for motifs, which further refines term embedding according to the granularity and semantics of each taxonomy node. Clustering is then applied to obtain sub-topics under a taxonomy node. Extensive experiments on two real-world datasets demonstrate the superiority of our method over the state-of-the-art, and further verify the effectiveness and importance of instance-level motif selection. more »

Award ID(s):: 1704532 1741317 1618481

PAR ID:: 10160122

Author(s) / Creator(s):: Shang, Jingbo; Zhang, Xinyang; Liu, Liyuan; Li, Sha; Han, Jiawei

Date Published:: 2020-04-19

Journal Name:: WWW '20: The Web Conference 2020

Volume:: 1

Issue:: 1

Page Range / eLocation ID:: 1908 to 1919

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3366423.3380259

More Like this