A Guided Topic-Noise Model for Short Texts

Churchill, Robert; Singh, Lisa; Ryan, Rebecca; Davis-Kean, Pamela

doi:10.1145/3485447.3512007

Citation Details

A Guided Topic-Noise Model for Short Texts

Researchers using social media data want to understand the discussions occurring in and about their respective fields. These domain experts often turn to topic models to help them see the entire landscape of the conversation, but unsupervised topic models often produce topic sets that miss topics experts expect or want to see. To solve this problem, we propose Guided Topic-Noise Model (GTM), a semi-supervised topic model designed with large domain-specific social media data sets in mind. The input to GTM is a set of topics that are of interest to the user and a small number of words or phrases that belong to those topics. These seed topics are used to guide the topic generation process, and can be augmented interactively, expanding the seed word list as the model provides new relevant words for different topics. GTM uses a novel initialization and a new sampling algorithm called Generalized Polya Urn (GPU) seed word sampling to produce a topic set that includes expanded seed topics, as well as new unsupervised topics. We demonstrate the robustness of GTM on open-ended responses from a public opinion survey and four domain-specific Twitter data sets. more »

Award ID(s):: 1934925 1934494

PAR ID:: 10351551

Author(s) / Creator(s):: Churchill, Robert; Singh, Lisa; Ryan, Rebecca; Davis-Kean, Pamela

Date Published:: 2022-04-25

Journal Name:: WWW '22: Proceedings of the ACM Web Conference 2022

Page Range / eLocation ID:: 2870 to 2878

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3485447.3512007

More Like this