skip to main content


Title: Twitter Sentiment Analysis via Bi-sense Emoji Embedding and Attention-based LSTM
Sentiment analysis on large-scale social media data is important to bridge the gaps between social media contents and real world activities including political election prediction, individual and public emotional status monitoring and analysis, and so on. Although textual sentiment analysis has been well studied based on platforms such as Twitter and Instagram, analysis of the role of extensive emoji uses in sentiment analysis remains light. In this paper, we propose a novel scheme for Twitter sentiment analysis with extra attention on emojis.We first learn bi-sense emoji embeddings under positive and negative sentimental tweets individually, and then train a sentiment classifier by attending on these bi-sense emoji embeddings with an attention-based long short-term memory network (LSTM). Our experiments show that the bi-sense embedding is effective for extracting sentiment-aware embeddings of emojis and outperforms the state-of-the-art models. We also visualize the attentions to show that the bi-sense emoji embedding provides better guidance on the attention mechanism to obtain a more robust understanding of the semantics and sentiments.  more » « less
Award ID(s):
1704337
PAR ID:
10103948
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of ACM Multimedia Conference
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Background As a number of vaccines for COVID-19 are given emergency use authorization by local health agencies and are being administered in multiple countries, it is crucial to gain public trust in these vaccines to ensure herd immunity through vaccination. One way to gauge public sentiment regarding vaccines for the goal of increasing vaccination rates is by analyzing social media such as Twitter. Objective The goal of this research was to understand public sentiment toward COVID-19 vaccines by analyzing discussions about the vaccines on social media for a period of 60 days when the vaccines were started in the United States. Using the combination of topic detection and sentiment analysis, we identified different types of concerns regarding vaccines that were expressed by different groups of the public on social media. Methods To better understand public sentiment, we collected tweets for exactly 60 days starting from December 16, 2020 that contained hashtags or keywords related to COVID-19 vaccines. We detected and analyzed different topics of discussion of these tweets as well as their emotional content. Vaccine topics were identified by nonnegative matrix factorization, and emotional content was identified using the Valence Aware Dictionary and sEntiment Reasoner sentiment analysis library as well as by using sentence bidirectional encoder representations from transformer embeddings and comparing the embedding to different emotions using cosine similarity. Results After removing all duplicates and retweets, 7,948,886 tweets were collected during the 60-day time period. Topic modeling resulted in 50 topics; of those, we selected 12 topics with the highest volume of tweets for analysis. Administration and access to vaccines were some of the major concerns of the public. Additionally, we classified the tweets in each topic into 1 of the 5 emotions and found fear to be the leading emotion in the tweets, followed by joy. Conclusions This research focused not only on negative emotions that may have led to vaccine hesitancy but also on positive emotions toward the vaccine. By identifying both positive and negative emotions, we were able to identify the public's response to the vaccines overall and to news events related to the vaccines. These results are useful for developing plans for disseminating authoritative health information and for better communication to build understanding and trust. 
    more » « less
  2. Abstract

    Public sentiment towards the police is a matter of great interest in the United States, as reports on police misconduct are increasingly being published in mass and social media. Here, we test how the public’s perception of the police can be majorly shaped by media reports of police brutality and local crime. We collect data on media coverage of police brutality and local crime, together with Twitter posts from 2010-2020 about the police in 18 metropolitan areas in the country. Using a range of model-free approaches building on transfer entropy analysis, we discover an association between public sentiment towards the police and media coverage of police brutality. We cautiously interpret this relationship as causal. Through this lens, the public’s sentiment towards the police appears to be driven by media-projected images of police misconduct, with no statistically significant evidence for a comparable effect driven by media reports on crimes.

     
    more » « less
  3. null (Ed.)
    Aspect-based sentiment analysis of review texts is of great value for understanding user feedback in a fine-grained manner. It has in general two sub-tasks: (i) extracting aspects from each review, and (ii) classifying aspect-based reviews by sentiment polarity. In this pa-per, we propose a weakly-supervised approach for aspect-based sentiment analysis, which uses only a few keywords describing each aspect/sentiment without using any labeled examples. Existing methods are either designed only for one of the sub-tasks, neglecting the benefit of coupling both, or are based on topic models that may contain overlapping concepts. We propose to first learn sentiment, aspectjoint topic embeddings in the word embedding space by imposing regularizations to encourage topic distinctiveness, and then use neural models to generalize the word-level discriminative information by pre-training the classifiers with embedding-based predictions and self-training them on unlabeled data. Our comprehensive performance analysis shows that our method generates quality joint topics and outperforms the baselines significantly (7.4%and 5.1% F1-score gain on average for aspect and sentiment classification respectively) on benchmark datasets. 
    more » « less
  4. Social media data have been increasingly used to study biomedical and health-related phenomena. From cohort-level discussions of a condition to population-level analyses of sentiment, social media have provided scientists with unprecedented amounts of data to study human behavior associated with a variety of health conditions and medical treatments. Here we review recent work in mining social media for biomedical, epidemiological, and social phenomena information relevant to the multilevel complexity of human health. We pay particular attention to topics where social media data analysis has shown the most progress, including pharmacovigilance and sentiment analysis, especially for mental health. We also discuss a variety of innovative uses of social media data for health-related applications as well as important limitations of social media data access and use. 
    more » « less
  5. Abstract Twitter is a frequent target for machine learning research and applications. Many problems, such as sentiment analysis, image tagging, and location prediction have been studied on Twitter data. Much of the prior work that addresses these problems within the context of Twitter focuses on a subset of the types of data available, e.g. only text, or text and image. However, a tweet can have several additional components, such as the location and the author, that can also provide useful information for machine learning tasks. In this work, we explore the problem of jointly modeling several tweet components in a common embedding space via task-agnostic representation learning, which can then be used to tackle various machine learning applications. To address this problem, we propose a deep neural network framework that combines text, image, and graph representations to learn joint embeddings for 5 tweet components: body, hashtags, images, user, and location. In our experiments, we use a large dataset of tweets to learn a joint embedding model and use it in multiple tasks to evaluate its performance vs. state-of-the-art baselines specific to each task. Our results show that our proposed generic method has similar or superior performance to specialized application-specific approaches, including accuracy of 52.43% vs. 48.88% for location prediction and recall of up to 15.93% vs. 12.12% for hashtag recommendation. 
    more » « less