skip to main content

Title: Classification of Health-Related Social Media Posts: Evaluation of Post Content–Classifier Models and Analysis of User Demographics
Background The increasing volume of health-related social media activity, where users connect, collaborate, and engage, has increased the significance of analyzing how people use health-related social media. Objective The aim of this study was to classify the content (eg, posts that share experiences and seek support) of users who write health-related social media posts and study the effect of user demographics on post content. Methods We analyzed two different types of health-related social media: (1) health-related online forums—WebMD and DailyStrength—and (2) general online social networks—Twitter and Google+. We identified several categories of post content and built classifiers to automatically detect these categories. These classifiers were used to study the distribution of categories for various demographic groups. Results We achieved an accuracy of at least 84% and a balanced accuracy of at least 0.81 for half of the post content categories in our experiments. In addition, 70.04% (4741/6769) of posts by male WebMD users asked for advice, and male users’ WebMD posts were more likely to ask for medical advice than female users’ posts. The majority of posts on DailyStrength shared experiences, regardless of the gender, age group, or location of their authors. Furthermore, health-related posts on Twitter and Google+ were more » used to share experiences less frequently than posts on WebMD and DailyStrength. Conclusions We studied and analyzed the content of health-related social media posts. Our results can guide health advocates and researchers to better target patient populations based on the application type. Given a research question or an outreach goal, our results can be used to choose the best online forums to answer the question or disseminate a message. « less
; ; ;
Award ID(s):
Publication Date:
Journal Name:
JMIR Public Health and Surveillance
Page Range or eLocation-ID:
Sponsoring Org:
National Science Foundation
More Like this
  1. Introduction Social media has created opportunities for children to gather social support online (Blackwell et al., 2016; Gonzales, 2017; Jackson, Bailey, & Foucault Welles, 2018; Khasawneh, Rogers, Bertrand, Madathil, & Gramopadhye, 2019; Ponathil, Agnisarman, Khasawneh, Narasimha, & Madathil, 2017). However, social media also has the potential to expose children and adolescents to undesirable behaviors. Research showed that social media can be used to harass, discriminate (Fritz & Gonzales, 2018), dox (Wood, Rose, & Thompson, 2018), and socially disenfranchise children (Page, Wisniewski, Knijnenburg, & Namara, 2018). Other research proposes that social media use might be correlated to the significant increase in suicide rates and depressive symptoms among children and adolescents in the past ten years (Mitchell, Wells, Priebe, & Ybarra, 2014). Evidence based research suggests that suicidal and unwanted behaviors can be promulgated through social contagion effects, which model, normalize, and reinforce self-harming behavior (Hilton, 2017). These harmful behaviors and social contagion effects may occur more frequently through repetitive exposure and modelling via social media, especially when such content goes “viral” (Hilton, 2017). One example of viral self-harming behavior that has generated significant media attention is the Blue Whale Challenge (BWC). The hearsay about this challenge is that individuals at allmore »ages are persuaded to participate in self-harm and eventually kill themselves (Mukhra, Baryah, Krishan, & Kanchan, 2017). Research is needed specifically concerning BWC ethical concerns, the effects the game may have on teenagers, and potential governmental interventions. To address this gap in the literature, the current study uses qualitative and content analysis research techniques to illustrate the risk of self-harm and suicide contagion through the portrayal of BWC on YouTube and Twitter Posts. The purpose of this study is to analyze the portrayal of BWC on YouTube and Twitter in order to identify the themes that are presented on YouTube and Twitter posts that share and discuss BWC. In addition, we want to explore to what extent are YouTube videos compliant with safe and effective suicide messaging guidelines proposed by the Suicide Prevention Resource Center (SPRC). Method Two social media websites were used to gather the data: 60 videos and 1,112 comments from YouTube and 150 posts from Twitter. The common themes of the YouTube videos, comments on those videos, and the Twitter posts were identified using grounded, thematic content analysis on the collected data (Padgett, 2001). Three codebooks were built, one for each type of data. The data for each site were analyzed, and the common themes were identified. A deductive coding analysis was conducted on the YouTube videos based on the nine SPRC safe and effective messaging guidelines (Suicide Prevention Resource Center, 2006). The analysis explored the number of videos that violated these guidelines and which guidelines were violated the most. The inter-rater reliabilities between the coders ranged from 0.61 – 0.81 based on Cohen’s kappa. Then the coders conducted consensus coding. Results & Findings Three common themes were identified among all the posts in the three social media platforms included in this study. The first theme included posts where social media users were trying to raise awareness and warning parents about this dangerous phenomenon in order to reduce the risk of any potential participation in BWC. This was the most common theme in the videos and posts. Additionally, the posts claimed that there are more than 100 people who have played BWC worldwide and provided detailed description of what each individual did while playing the game. These videos also described the tasks and different names of the game. Only few videos provided recommendations to teenagers who might be playing or thinking of playing the game and fewer videos mentioned that the provided statistics were not confirmed by reliable sources. The second theme included posts of people that either criticized the teenagers who participated in BWC or made fun of them for a couple of reasons: they agreed with the purpose of BWC of “cleaning the society of people with mental issues,” or they misunderstood why teenagers participate in these kind of challenges, such as thinking they mainly participate due to peer pressure or to “show off”. The last theme we identified was that most of these users tend to speak in detail about someone who already participated in BWC. These videos and posts provided information about their demographics and interviews with their parents or acquaintances, who also provide more details about the participant’s personal life. The evaluation of the videos based on the SPRC safe messaging guidelines showed that 37% of the YouTube videos met fewer than 3 of the 9 safe messaging guidelines. Around 50% of them met only 4 to 6 of the guidelines, while the remaining 13% met 7 or more of the guidelines. Discussion This study is the first to systematically investigate the quality, portrayal, and reach of BWC on social media. Based on our findings from the emerging themes and the evaluation of the SPRC safe messaging guidelines we suggest that these videos could contribute to the spread of these deadly challenges (or suicide in general since the game might be a hoax) instead of raising awareness. Our suggestion is parallel with similar studies conducted on the portrait of suicide in traditional media (Fekete & Macsai, 1990; Fekete & Schmidtke, 1995). Most posts on social media romanticized people who have died by following this challenge, and younger vulnerable teens may see the victims as role models, leading them to end their lives in the same way (Fekete & Schmidtke, 1995). The videos presented statistics about the number of suicides believed to be related to this challenge in a way that made suicide seem common (Cialdini, 2003). In addition, the videos presented extensive personal information about the people who have died by suicide while playing the BWC. These videos also provided detailed descriptions of the final task, including pictures of self-harm, material that may encourage vulnerable teens to consider ending their lives and provide them with methods on how to do so (Fekete & Macsai, 1990). On the other hand, these videos both failed to emphasize prevention by highlighting effective treatments for mental health problems and failed to encourage teenagers with mental health problems to seek help and providing information on where to find it. YouTube and Twitter are capable of influencing a large number of teenagers (Khasawneh, Ponathil, Firat Ozkan, & Chalil Madathil, 2018; Pater & Mynatt, 2017). We suggest that it is urgent to monitor social media posts related to BWC and similar self-harm challenges (e.g., the Momo Challenge). Additionally, the SPRC should properly educate social media users, particularly those with more influence (e.g., celebrities) on elements that boost negative contagion effects. While the veracity of these challenges is doubted by some, posting about the challenges in unsafe manners can contribute to contagion regardless of the challlenges’ true nature.« less
  2. Wren, Jonathan (Ed.)
    Abstract Motivation Substance abuse constitutes one of the major contemporary health epidemics. Recently, the use of social media platforms has garnered interest as a novel source of data for drug addiction epidemiology. Often however, the language used in such forums comprises slang and jargon. Currently, there are no publicly available resources to automatically analyse the esoteric language-use in the social media drug-use sub-culture. This lacunae introduces critical challenges for interpreting, sensemaking and modeling of addiction epidemiology using social media. Results Drug-Use Insights (DUI) is a public and open-source web application to address the aforementioned deficiency. DUI is underlined by a hierarchical taxonomy encompassing 108 different addiction related categories consisting of over 9,000 terms, where each category encompasses a set of semantically related terms. These categories and terms were established by utilizing thematic analysis in conjunction with term embeddings generated from 7,472,545 Reddit posts made by 1,402,017 redditors. Given post(s) from social media forums such as Reddit and Twitter, DUI can be used foremost to identify constituent terms related to drug use. Furthermore, the DUI categories and integrated visualization tools can be leveraged for semantic- and exploratory analysis. To the best of our knowledge, DUI utilizes the largest number of substancemore »use and recovery social media posts used in a study and represents the first significant online taxonomy of drug abuse terminology. Availability The DUI web server and source code are available at: Supplementary information Supplementary data are available at Bioinformatics online.« less
  3. Background Internet data can be used to improve infectious disease models. However, the representativeness and individual-level validity of internet-derived measures are largely unexplored as this requires ground truth data for study. Objective This study sought to identify relationships between Web-based behaviors and/or conversation topics and health status using a ground truth, survey-based dataset. Methods This study leveraged a unique dataset of self-reported surveys, microbiological laboratory tests, and social media data from the same individuals toward understanding the validity of individual-level constructs pertaining to influenza-like illness in social media data. Logistic regression models were used to identify illness in Twitter posts using user posting behaviors and topic model features extracted from users’ tweets. Results Of 396 original study participants, only 81 met the inclusion criteria for this study. Of these participants’ tweets, we identified only two instances that were related to health and occurred within 2 weeks (before or after) of a survey indicating symptoms. It was not possible to predict when participants reported symptoms using features derived from topic models (area under the curve [AUC]=0.51; P=.38), though it was possible using behavior features, albeit with a very small effect size (AUC=0.53; P≤.001). Individual symptoms were also generally not predictable either.more »The study sample and a random sample from Twitter are predictably different on held-out data (AUC=0.67; P≤.001), meaning that the content posted by people who participated in this study was predictably different from that posted by random Twitter users. Individuals in the random sample and the GoViral sample used Twitter with similar frequencies (similar @ mentions, number of tweets, and number of retweets; AUC=0.50; P=.19). Conclusions To our knowledge, this is the first instance of an attempt to use a ground truth dataset to validate infectious disease observations in social media data. The lack of signal, the lack of predictability among behaviors or topics, and the demonstrated volunteer bias in the study population are important findings for the large and growing body of disease surveillance using internet-sourced data.« less
  4. Social media platforms are accused repeatedly of creating environments in which women are bullied and harassed. We argue that online aggression toward women aims to reinforce traditional feminine norms and stereotypes. In a mixed methods study, we find that this type of aggression on Twitter is common and extensive and that it can spread far beyond the original target. We locate over 2.9 million tweets in one week that contain instances of gendered insults (e.g., “bitch,” “cunt,” “slut,” or “whore”)—averaging 419,000 sexist slurs per day. The vast majority of these tweets are negative in sentiment. We analyze the social networks of the conversations that ensue in several cases and demonstrate how the use of “replies,” “retweets,” and “likes” can further victimize a target. Additionally, we develop a sentiment classifier that we use in a regression analysis to compare the negativity of sexist messages. We find that words in a message that reinforce feminine stereotypes inflate the negative sentiment of tweets to a significant and sizeable degree. These terms include those insulting someone’s appearance (e.g., “ugly”), intellect (e.g., “stupid”), sexual experience (e.g., “promiscuous”), mental stability (e.g., “crazy”), and age (“old”). Messages enforcing beauty norms tend to be particularly negative. In sum,more »hostile, sexist tweets are strategic in nature. They aim to promote traditional, cultural beliefs about femininity, such as beauty ideals, and they shame victims by accusing them of falling short of these standards. Harassment on social media constitutes an everyday, routine occurrence, with researchers finding 9,764,583 messages referencing bullying on Twitter over the span of two years (Bellmore et al. 2015). In other words, Twitter users post over 13,000 bullying-related messages on a daily basis. Forms of online aggression also carry with them serious, negative consequences. Repeated research documents that bullying victims suffer from a host of deleterious outcomes, such as low self-esteem (Hinduja and Patchin 2010), emotional and psychological distress (Ybarra et al. 2006), and negative emotions (Faris and Felmlee 2014; Juvonen and Gross 2008). Compared to those who have not been attacked, victims also tend to report more incidents of suicide ideation and attempted suicide (Hinduja and Patchin 2010). Several studies document that the targets of cyberbullying are disproportionately women (Backe et al. 2018; Felmlee and Faris 2016; Hinduja and Patchin 2010; Pew Research Center 2017), although there are exceptions depending on definitions and venues. Yet, we know little about the content or pattern of cyber aggression directed toward women in online forums. The purpose of the present research, therefore, is to examine in detail the practice of aggressive messaging that targets women and femininity within the social media venue of Twitter. Using both qualitative and quantitative analyses, we investigate the role of gender norm regulation in these patterns of cyber aggression.« less
  5. COVID-19 resulted in health and logistical challenges for many sectors of the American economy, including the trucking industry. This study examined how the pandemic impacted the trucking industry, focused on the pandemic’s impacts on company operations, health, and stress of trucking industry employees. Data were collected from three sources: surveys, focus groups, and social media posts. Individuals at multiple organizational levels of trucking companies (i.e., supervisors, upper-level management, and drivers) completed an online survey and participated in online focus groups. Data from focus groups were coded using a thematic analysis approach. Publicly available social media posts from Twitter were analyzed using a sentiment analysis framework to assess changes in public sentiment about the trucking industry pre- and during-COVID-19. Two themes emerged from the focus groups: (1) trucking company business strategies and adaptations and (2) truck driver experiences and workplace safety. Participants reported supply chain disruptions and new consumer buying trends as having larger industry-wide impacts. Company adaptability emerged due to freight variability, leading organizations to pivot business models and create solutions to reduce operational costs. Companies responded to COVID-19 by accommodating employees’ concerns and implementing safety measures. Truck drivers noted an increase in positive public perception of truck drivers, butmore »job quality factors worsened due to closed amenities and decreased social interaction. Social media sentiment analysis also illustrated an increase in positive public sentiment towards the trucking industry during COVID-19. The pandemic resulted in multi-level economic, health, and social impacts on the trucking industry, which included economic impacts on companies and economic, social and health impacts on employees within the industry levels. Further research can expand on this study to provide an understanding of the long-term impacts of the pandemic on the trucking industry companies within the industry and segments of the trucking industry workforce.« less