Sentiment analysis on large-scale social media data is important to bridge the gaps between social media contents and real world activities including political election prediction, individual and public emotional status monitoring and analysis, and so on. Although textual sentiment analysis has been well studied based on platforms such as Twitter and Instagram, analysis of the role of extensive emoji uses in sentiment analysis remains light. In this paper, we propose a novel scheme for Twitter sentiment analysis with extra attention on emojis.We first learn bi-sense emoji embeddings under positive and negative sentimental tweets individually, and then train a sentiment classifier by attending on these bi-sense emoji embeddings with an attention-based long short-term memory network (LSTM). Our experiments show that the bi-sense embedding is effective for extracting sentiment-aware embeddings of emojis and outperforms the state-of-the-art models. We also visualize the attentions to show that the bi-sense emoji embedding provides better guidance on the attention mechanism to obtain a more robust understanding of the semantics and sentiments.
Incorporating Sentiment Analysis with Epistemic Network Analysis to Enhance Discourse Analysis of Twitter Data
While there has been much growth in the use of microblogging platforms (e.g., Twitter) to share information on a range of topics, researchers struggle to analyze the large volumes of data produced on such platforms. Established methods such as Sentiment Analysis (SA) have been criticized over their inaccuracy and limited analytical depth. In this exploratory methodological paper, we propose a combination of SA with Epistemic Network Analysis (ENA) as an alternative approach for providing richer qualitative and quantitative insights into Twitter discourse. We illustrate the application and potential use of these approaches by visualizing the differences between tweets directed or discussing Democrats and Republicans after the COVID-19 Stimulus Package announcement in the US. SA was integrated into ENA models in two ways: as a part of the blocking variable and as a set of codes. Our results suggest that incorporating SA into ENA allowed for a better understanding of how groups viewed the components of the stimulus issue by splitting them by sentiment and enabled a meaningful inclusion of data with singular subject focus into the ENA models.
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Advances in Quantitative Ethnography: Second International Conference, ICQE 2020, Malibu, CA, USA, February 1-3, 2021, Proceedings
- Page Range or eLocation-ID:
- Sponsoring Org:
- National Science Foundation
More Like this
In October 2017, numerous women accused producer Harvey Weinstein of sexual harassment. Their stories encouraged other women to voice allegations of sexual harassment against many high profile men, including politicians, actors, and producers. These events are broadly referred to as the #MeToo movement, named for the use of the hashtag “#metoo” on social media platforms like Twitter and Facebook. The movement has widely been referred to as “empowering” because it has amplified the voices of previously unheard women over those of traditionally powerful men. In this work, we investigate dynamics of sentiment, power and agency in online media coverage of these events. Using a corpus of online media articles about the #MeToo movement, we present a contextual affective analysis—an entity-centric approach that uses contextualized lexicons to examine how people are portrayed in media articles. We show that while these articles are sympathetic towards women who have experienced sexual harassment, they consistently present men as most powerful, even after sexual assault allegations. While we focus on media coverage of the #MeToo movement, our method for contextual affective analysis readily generalizes to other domains.
Social media platforms are accused repeatedly of creating environments in which women are bullied and harassed. We argue that online aggression toward women aims to reinforce traditional feminine norms and stereotypes. In a mixed methods study, we find that this type of aggression on Twitter is common and extensive and that it can spread far beyond the original target. We locate over 2.9 million tweets in one week that contain instances of gendered insults (e.g., “bitch,” “cunt,” “slut,” or “whore”)—averaging 419,000 sexist slurs per day. The vast majority of these tweets are negative in sentiment. We analyze the social networks of the conversations that ensue in several cases and demonstrate how the use of “replies,” “retweets,” and “likes” can further victimize a target. Additionally, we develop a sentiment classifier that we use in a regression analysis to compare the negativity of sexist messages. We find that words in a message that reinforce feminine stereotypes inflate the negative sentiment of tweets to a significant and sizeable degree. These terms include those insulting someone’s appearance (e.g., “ugly”), intellect (e.g., “stupid”), sexual experience (e.g., “promiscuous”), mental stability (e.g., “crazy”), and age (“old”). Messages enforcing beauty norms tend to be particularly negative. In sum,more »
An important means for disseminating information in social media platforms is by including URLs that point to external sources in user posts. In Twitter, we estimate that about 21% of the daily stream of English-language tweets contain URLs. We notice that NLP tools make little attempt at understanding the relationship between the content of the URL and the text surrounding it in a tweet. In this work, we study the structure of tweets with URLs relative to the content of the Web documents pointed to by the URLs. We identify several segments classes that may appear in a tweet with URLs, such as the title of a Web page and the user's original content. Our goals in this paper are: introduce, define, and analyze the segmentation problem of tweets with URLs, develop an effective algorithm to solve it, and show that our solution can benefit sentiment analysis on Twitter. We also show that the problem is an instance of the block edit distance problem, and thus an NP-hard problem.
Abstract Twitter is a frequent target for machine learning research and applications. Many problems, such as sentiment analysis, image tagging, and location prediction have been studied on Twitter data. Much of the prior work that addresses these problems within the context of Twitter focuses on a subset of the types of data available, e.g. only text, or text and image. However, a tweet can have several additional components, such as the location and the author, that can also provide useful information for machine learning tasks. In this work, we explore the problem of jointly modeling several tweet components in a common embedding space via task-agnostic representation learning, which can then be used to tackle various machine learning applications. To address this problem, we propose a deep neural network framework that combines text, image, and graph representations to learn joint embeddings for 5 tweet components: body, hashtags, images, user, and location. In our experiments, we use a large dataset of tweets to learn a joint embedding model and use it in multiple tasks to evaluate its performance vs. state-of-the-art baselines specific to each task. Our results show that our proposed generic method has similar or superior performance to specialized application-specific approaches,more »