skip to main content


Title: Public Reaction to Scientific Research via Twitter Sentiment Prediction
Abstract Purpose Social media users share their ideas, thoughts, and emotions with other users. However, it is not clear how online users would respond to new research outcomes. This study aims to predict the nature of the emotions expressed by Twitter users toward scientific publications. Additionally, we investigate what features of the research articles help in such prediction. Identifying the sentiments of research articles on social media will help scientists gauge a new societal impact of their research articles. Design/methodology/approach Several tools are used for sentiment analysis, so we applied five sentiment analysis tools to check which are suitable for capturing a tweet's sentiment value and decided to use NLTK VADER and TextBlob. We segregated the sentiment value into negative, positive, and neutral. We measure the mean and median of tweets’ sentiment value for research articles with more than one tweet. We next built machine learning models to predict the sentiments of tweets related to scientific publications and investigated the essential features that controlled the prediction models. Findings We found that the most important feature in all the models was the sentiment of the research article title followed by the author count. We observed that the tree-based models performed better than other classification models, with Random Forest achieving 89% accuracy for binary classification and 73% accuracy for three-label classification. Research limitations In this research, we used state-of-the-art sentiment analysis libraries. However, these libraries might vary at times in their sentiment prediction behavior. Tweet sentiment may be influenced by a multitude of circumstances and is not always immediately tied to the paper's details. In the future, we intend to broaden the scope of our research by employing word2vec models. Practical implications Many studies have focused on understanding the impact of science on scientists or how science communicators can improve their outcomes. Research in this area has relied on fewer and more limited measures, such as citations and user studies with small datasets. There is currently a critical need to find novel methods to quantify and evaluate the broader impact of research. This study will help scientists better comprehend the emotional impact of their work. Additionally, the value of understanding the public's interest and reactions helps science communicators identify effective ways to engage with the public and build positive connections between scientific communities and the public. Originality/value This study will extend work on public engagement with science, sociology of science, and computational social science. It will enable researchers to identify areas in which there is a gap between public and expert understanding and provide strategies by which this gap can be bridged.  more » « less
Award ID(s):
2022443
NSF-PAR ID:
10351821
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Journal of Data and Information Science
Volume:
7
Issue:
1
ISSN:
2543-683X
Page Range / eLocation ID:
97 to 124
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Social media platforms provide users with various ways of interacting with each other, such as commenting, reacting to posts, sharing content, and uploading pictures. Facebook is one of the most popular platforms, and its users frequently share and reshare posts, including research articles. Moreover, the reactions feature on Facebook allows users to express their feelings towards the content they view, providing valuable data for analysis. This study aims to predict the emotional impact of Facebook posts relating to research articles. We collected data on Facebook posts related to various scientific research domains, including Health Sciences, Social Sciences, Dentistry, Arts, and Humanities. We observed Facebook users’ reactions towards research articles and posts and found that ‘Like’ reactions were the most common. We also noticed that research articles from the Dentistry research domain received a lot of ‘Haha’ reactions. We used machine learning models to predict the sentiment of Facebook posts related to research articles. We used features such as the research article’s title sentiment, abstract sentiment, abstract length, author count, and research domain to build the models. We used five classifiers: Random Forest, Decision Tree, K-Nearest Neighbors, Logistic Regression, and Naïve Bayes. The models were evaluated using accuracy, precision, recall, and F-1 score metrics. The Random Forest classifier was the best model for two- and three-class labels, achieving accuracy measures of 86% and 66%, respectively. We also evaluated the feature importance for the Random Forest model and found that the sentiment of the research article’s title is crucial in predicting the sentiment of the Facebook post. This study has substantial implications for public engagement in science-related messages. The emotional reactions of Facebook users towards research articles and posts can provide valuable insights into public engagement in science, and predicting the emotional impact of Facebook posts related to research articles can help researchers understand how the public perceives scientific research. The findings of the study can aid researchers in effectively communicating their research and engaging the public in scientific discourse. 
    more » « less
  2. null (Ed.)
    This study evaluates the level of service of shared transportation facilities through mining geotagged data from social media and analyzing the perceptions of road users. An algorithm is developed adopting a text classification approach with contextual understanding to filter out relevant information related to users’ perceptions toward active mobility. Using a heuristic-based keyword matching approach produces about 75% tweets that are out of context, so that approach is deemed unsuitable for information extraction from Twitter. This study implements six different text classification models and compares the performance of these models for tweet classification. The model is applied to real-world data to filter out relevant information, and content analysis is performed to check the distribution of keywords within the filtered data. The text classification model “term frequency-inverse document frequency” vectorizer-based logistic regression model performed best at classifying the tweets. To select the best model, the performances of the models are compared based on precision, recall, F1 score (geometric mean of precision and recall), and accuracy metrics. The findings from the analysis show that the proposed method can help produce more relevant information on walking and biking facilities as well as safety concerns. By analyzing the sentiments of the filtered data, the existing condition of biking and walking facilities in the DC area can be inferred. This method can be a critical part of the decision support system to understand the qualitative level of service of existing transportation facilities. 
    more » « less
  3. An alarming proportion of the US population is overweight: 2/3 of US adults are overweight, and 1/3 of those overweight are obese. Obesity increases the risk of illnesses such as diabetes and cardiovascular diseases. This epidemic can be attributed to the combination of cheap, high-calorie food and lack of physical activity. In this paper, we propose a Big Data Analytics framework, called BiDAF, that aims to explore social contextual influences on healthy eating. For this purpose, we classified food tweets and social media images into as either healthy or unhealthy as well as food sentiments into either positive or negative, and further mapped them to an obesity prevalence map. The classification outcomes would be useful to reveal the social food trends and sentiments of the Centers for Disease and Control Prevention (CDC) USA obesity regions. The BiDAF framework has been implemented on Apache Spark and TensorFlow platforms. We have evaluated the BiDAF framework in terms of the accuracy on the food tweet classification and sentiment analysis. The experimental results indicated that the BiDAF framework is effective in classification and sentiment analysis of food tweet messages and also showed its potential in exploring social contextual influences that may contribute to healthy eating. 
    more » « less
  4. Sentiment analysis on large-scale social media data is important to bridge the gaps between social media contents and real world activities including political election prediction, individual and public emotional status monitoring and analysis, and so on. Although textual sentiment analysis has been well studied based on platforms such as Twitter and Instagram, analysis of the role of extensive emoji uses in sentiment analysis remains light. In this paper, we propose a novel scheme for Twitter sentiment analysis with extra attention on emojis.We first learn bi-sense emoji embeddings under positive and negative sentimental tweets individually, and then train a sentiment classifier by attending on these bi-sense emoji embeddings with an attention-based long short-term memory network (LSTM). Our experiments show that the bi-sense embedding is effective for extracting sentiment-aware embeddings of emojis and outperforms the state-of-the-art models. We also visualize the attentions to show that the bi-sense emoji embedding provides better guidance on the attention mechanism to obtain a more robust understanding of the semantics and sentiments. 
    more » « less
  5. Background As a number of vaccines for COVID-19 are given emergency use authorization by local health agencies and are being administered in multiple countries, it is crucial to gain public trust in these vaccines to ensure herd immunity through vaccination. One way to gauge public sentiment regarding vaccines for the goal of increasing vaccination rates is by analyzing social media such as Twitter. Objective The goal of this research was to understand public sentiment toward COVID-19 vaccines by analyzing discussions about the vaccines on social media for a period of 60 days when the vaccines were started in the United States. Using the combination of topic detection and sentiment analysis, we identified different types of concerns regarding vaccines that were expressed by different groups of the public on social media. Methods To better understand public sentiment, we collected tweets for exactly 60 days starting from December 16, 2020 that contained hashtags or keywords related to COVID-19 vaccines. We detected and analyzed different topics of discussion of these tweets as well as their emotional content. Vaccine topics were identified by nonnegative matrix factorization, and emotional content was identified using the Valence Aware Dictionary and sEntiment Reasoner sentiment analysis library as well as by using sentence bidirectional encoder representations from transformer embeddings and comparing the embedding to different emotions using cosine similarity. Results After removing all duplicates and retweets, 7,948,886 tweets were collected during the 60-day time period. Topic modeling resulted in 50 topics; of those, we selected 12 topics with the highest volume of tweets for analysis. Administration and access to vaccines were some of the major concerns of the public. Additionally, we classified the tweets in each topic into 1 of the 5 emotions and found fear to be the leading emotion in the tweets, followed by joy. Conclusions This research focused not only on negative emotions that may have led to vaccine hesitancy but also on positive emotions toward the vaccine. By identifying both positive and negative emotions, we were able to identify the public's response to the vaccines overall and to news events related to the vaccines. These results are useful for developing plans for disseminating authoritative health information and for better communication to build understanding and trust. 
    more » « less