Percolation-based topic modeling for tweets

Churchill, R.; Singh, L.

Citation Details

This paper investigates topic modeling within a noisy domain. The goal is to generate topics that maximize topic coherence while introducing only a small amount of noise. The problem is motivated by the practical setting of short, noisy tweets, where it is important to generate topics containing a larger number of content words than noise words. For the most general version of this problem, we propose a new method, λ-CLIQ. It is a simple variant of the kclique percolation algorithm that employs for quasi-cliques during graph decomposition and percolation based on λ, a graph property variant. While the topics generated using our base algorithm are highly coherent, they are often contain too few words. To increase topic size, we add a post processing step that augments identified topic words using locally trained embeddings. We show that both λ-CLIQ and λ-CLIQ+ outperform the state of the art in terms of topic coherence on three distinct Twitter data sets. more »

Award ID(s):: 1934494

PAR ID:: 10188398

Author(s) / Creator(s):: Churchill, R.; Singh, L.

Date Published:: 2020-08-24

Journal Name:: WISDOM 2020 : The 9th KDD Workshop on Issues of Sentiment Discovery and Opinion Mining

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this