skip to main content


Search for: All records

Award ID contains: 1934925

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available May 1, 2024
  2. Amini, MR. ; Canu, S. ; Fischer, A. ; Guns, T. ; Kralj Novak, P. ; Tsoumakas, G. (Ed.)
  3. Researchers using social media data want to understand the discussions occurring in and about their respective fields. These domain experts often turn to topic models to help them see the entire landscape of the conversation, but unsupervised topic models often produce topic sets that miss topics experts expect or want to see. To solve this problem, we propose Guided Topic-Noise Model (GTM), a semi-supervised topic model designed with large domain-specific social media data sets in mind. The input to GTM is a set of topics that are of interest to the user and a small number of words or phrases that belong to those topics. These seed topics are used to guide the topic generation process, and can be augmented interactively, expanding the seed word list as the model provides new relevant words for different topics. GTM uses a novel initialization and a new sampling algorithm called Generalized Polya Urn (GPU) seed word sampling to produce a topic set that includes expanded seed topics, as well as new unsupervised topics. We demonstrate the robustness of GTM on open-ended responses from a public opinion survey and four domain-specific Twitter data sets. 
    more » « less
  4. Topic models have been applied to everything from books to newspapers to social media posts in an effort to identify the most prevalent themes of a text corpus. We provide an in-depth analysis of unsupervised topic models from their inception to today. We trace the origins of different types of contemporary topic models, beginning in the 1990s, and we compare their proposed algorithms, as well as their different evaluation approaches. Throughout, we also describe settings in which topic models have worked well and areas where new research is needed, setting the stage for the next generation of topic models. 
    more » « less
  5. This study investigated the content of parenting information shared on social media by identifying the range and frequency of topics shared by parenting-focused accounts on Twitter. Using the Twitter API, a universe of 675,069 tweets were gathered from 74 of the most-followed parenting-focused accounts, or “hubs,” from January 2016 to June 2018. Using a custom, semi-automated topic modeling approach, we identified the topics – and subtopics within topics – parenting hubs shared with their followers and investigated whether any meaningful differences in topical focus existed between accounts targeting mothers versus fathers. Results indicate that over one third of tweets were about Parenting Behavior and nearly one quarter about Health, with Entertainment, School and Motherhood and Fatherhood generally as less tweeted topics. Mother-focused accounts tweeted more about Health than father-focused accounts, which tweeted more than others about Entertainment. Implications for future parenting and social media research are discussed. 
    more » « less
  6. The #MeToo movement is one of several calls for social change to gain traction on Twitter in the past decade. The movement went viral after prominent individuals shared their experiences, and much of its power continues to be derived from experience sharing. Because millions of #MeToo tweets are published every year, it is important to accurately identify experience-related tweets. Therefore, we propose a new learning task and compare the effectiveness of classic machine learning models, ensemble models, and a neural network model that incorporates a pre-trained language model to reduce the impact of feature sparsity. We find that even with limited training data, the neural network model outperforms the classic and ensemble classifiers. Finally, we analyze the experience-related conversation in English during the first year of the #MeToo movement and determine that experience tweets represent a sizable minority of the conversation and are moderately correlated to major events. 
    more » « less