skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, October 10 until 2:00 AM ET on Friday, October 11 due to maintenance. We apologize for the inconvenience.


Title: Predicting Facebook sentiments towards research
Social media platforms provide users with various ways of interacting with each other, such as commenting, reacting to posts, sharing content, and uploading pictures. Facebook is one of the most popular platforms, and its users frequently share and reshare posts, including research articles. Moreover, the reactions feature on Facebook allows users to express their feelings towards the content they view, providing valuable data for analysis. This study aims to predict the emotional impact of Facebook posts relating to research articles. We collected data on Facebook posts related to various scientific research domains, including Health Sciences, Social Sciences, Dentistry, Arts, and Humanities. We observed Facebook users’ reactions towards research articles and posts and found that ‘Like’ reactions were the most common. We also noticed that research articles from the Dentistry research domain received a lot of ‘Haha’ reactions. We used machine learning models to predict the sentiment of Facebook posts related to research articles. We used features such as the research article’s title sentiment, abstract sentiment, abstract length, author count, and research domain to build the models. We used five classifiers: Random Forest, Decision Tree, K-Nearest Neighbors, Logistic Regression, and Naïve Bayes. The models were evaluated using accuracy, precision, recall, and F-1 score metrics. The Random Forest classifier was the best model for two- and three-class labels, achieving accuracy measures of 86% and 66%, respectively. We also evaluated the feature importance for the Random Forest model and found that the sentiment of the research article’s title is crucial in predicting the sentiment of the Facebook post. This study has substantial implications for public engagement in science-related messages. The emotional reactions of Facebook users towards research articles and posts can provide valuable insights into public engagement in science, and predicting the emotional impact of Facebook posts related to research articles can help researchers understand how the public perceives scientific research. The findings of the study can aid researchers in effectively communicating their research and engaging the public in scientific discourse.  more » « less
Award ID(s):
2022443
NSF-PAR ID:
10482181
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Elsevier https://www.sciencedirect.com/science/article/pii/S2949719123000079
Date Published:
Journal Name:
Natural Language Processing Journal
Volume:
3
ISSN:
2949-7191
Subject(s) / Keyword(s):
Research sentiments, Applied machine learning, Sentiment analysis, Facebook reactions
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Purpose Social media users share their ideas, thoughts, and emotions with other users. However, it is not clear how online users would respond to new research outcomes. This study aims to predict the nature of the emotions expressed by Twitter users toward scientific publications. Additionally, we investigate what features of the research articles help in such prediction. Identifying the sentiments of research articles on social media will help scientists gauge a new societal impact of their research articles. Design/methodology/approach Several tools are used for sentiment analysis, so we applied five sentiment analysis tools to check which are suitable for capturing a tweet's sentiment value and decided to use NLTK VADER and TextBlob. We segregated the sentiment value into negative, positive, and neutral. We measure the mean and median of tweets’ sentiment value for research articles with more than one tweet. We next built machine learning models to predict the sentiments of tweets related to scientific publications and investigated the essential features that controlled the prediction models. Findings We found that the most important feature in all the models was the sentiment of the research article title followed by the author count. We observed that the tree-based models performed better than other classification models, with Random Forest achieving 89% accuracy for binary classification and 73% accuracy for three-label classification. Research limitations In this research, we used state-of-the-art sentiment analysis libraries. However, these libraries might vary at times in their sentiment prediction behavior. Tweet sentiment may be influenced by a multitude of circumstances and is not always immediately tied to the paper's details. In the future, we intend to broaden the scope of our research by employing word2vec models. Practical implications Many studies have focused on understanding the impact of science on scientists or how science communicators can improve their outcomes. Research in this area has relied on fewer and more limited measures, such as citations and user studies with small datasets. There is currently a critical need to find novel methods to quantify and evaluate the broader impact of research. This study will help scientists better comprehend the emotional impact of their work. Additionally, the value of understanding the public's interest and reactions helps science communicators identify effective ways to engage with the public and build positive connections between scientific communities and the public. Originality/value This study will extend work on public engagement with science, sociology of science, and computational social science. It will enable researchers to identify areas in which there is a gap between public and expert understanding and provide strategies by which this gap can be bridged. 
    more » « less
  2. The public interest in accurate scientific communication, underscored by recent public health crises, highlights how content often loses critical pieces of information as it spreads online. However, multi-platform analyses of this phenomenon remain limited due to challenges in data collection. Collecting mentions of research tracked by Altmetric LLC, we examine information retention in over 4 million online posts referencing 9,765 of the most-mentioned scientific articles across blog sites, Facebook, news sites, Twitter, and Wikipedia. To do so, we present a burst-based framework for examining online discussions about science over time and across different platforms. To measure information retention, we develop a keyword-based computational measure comparing an online post to the scientific article’s abstract. We evaluate our measure using ground truth data labeled by within-field experts. We highlight three main findings: first, we find a strong tendency towards low levels of information retention, following a distinct trajectory of loss except when bursts of attention begin on social media. Second, platforms show significant differences in information retention. Third, sequences involving more platforms tend to be associated with higher information retention. These findings highlight a strong tendency towards information loss over time—posing a critical concern for researchers, policymakers, and citizens alike—but suggest that multi-platform discussions may improve information retention overall. 
    more » « less
  3. Lin, Yu-Ru ; Cha, Meeyoung ; Quercia, Daniele (Ed.)
    The public interest in accurate scientific communication, underscored by recent public health crises, highlights how content often loses critical pieces of information as it spreads on-line. However, multi-platform analyses of this phenomenon remain limited due to challenges in data collection. Collecting mentions of research tracked by Altmetric LLC, we examine information retention in the over 4 million online posts referencing 9,765 of the most-mentioned scientific articles across blog sites, Facebook, news sites, Twitter, and Wikipedia. To do so, we present a burst-based framework for examining online discussions about science over time and across different platforms. To measure information retention, we develop a keyword-based computational measure comparing an online post to the scientific article’s abstract. We evaluate our measure using ground truth data labeled by within field experts. We highlight three main findings: first, we find a strong tendency towards low levels of information retention, following a distinct trajectory of loss except when bursts of attention begin in social media. Second, platforms show significant differences in information retention. Third, sequences involving more platforms tend to be associated with higher information retention. These findings highlight a strong tendency towards information loss over time—posing a critical concern for researchers, policymakers, and citizens alike—but suggest that multi-platform discussions may im-prove information retention overall. 
    more » « less
  4. The public interest in accurate scientific communication, underscored by recent public health crises, highlights how content often loses critical pieces of information as it spreads online. However, multi-platform analyses of this phenomenon remain limited due to challenges in data collection. Collecting mentions of research tracked by Altmetric LLC, we examine information retention in the over 4 million online posts referencing 9,765 of the most-mentioned scientific articles across blog sites, Facebook, news sites, Twitter, and Wikipedia. To do so, we present a burst-based framework for examining online discussions about science over time and across different platforms. To measure information retention, we develop a keyword-based computational measure comparing an online post to the scientific article's abstract. We evaluate our measure using ground truth data labeled by within field experts. We highlight three main findings: first, we find a strong tendency towards low levels of information retention, following a distinct trajectory of loss except when bursts of attention begin in social media. Second, platforms show significant differences in information retention. Third, sequences involving more platforms tend to be associated with higher information retention. These findings highlight a strong tendency towards information loss over time---posing a critical concern for researchers, policymakers, and citizens alike---but suggest that multi-platform discussions may improve information retention overall.

     
    more » « less
  5. Social media companies wield power over their users through design, policy, and through their participation in public discourse. We set out to understand how companies leverage public relations to influence expectations of privacy and privacy-related norms. To interrogate the discourse productions of companies in relation to privacy, we examine the blogs associated with three major social media platforms: Facebook, Instagram (both owned by Facebook Inc.), and Snapchat. We analyze privacy-related posts using critical discourse analysis to demonstrate how these powerful entities construct narratives about users and their privacy expectations. We find that each of these platforms often make use of discourse about "vulnerable" identities to invoke relations of power, while at the same time, advancing interpretations and values that favor data capitalism. Finally, we discuss how these public narratives might influence the construction of users' own interpretations of appropriate privacy norms and conceptions of self. We contend that expectations of privacy and social norms are not simply artifacts of users' own needs and desires, but co-constructions that reflect the influence of social media companies themselves. 
    more » « less