skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Ginger Cannot Cure Cancer: Battling FakeHealth News with a Comprehensive Data Repository
Nowadays, Internet is a primary source of attaining health in-formation. Massive fake health news which is spreading overthe Internet, has become a severe threat to public health. Nu-merous studies and research works have been done in fakenews detection domain, however, few of them are designedto cope with the challenges in health news. For instance, thedevelopment of explainable is required for fake health newsdetection. To mitigate these problems, we construct a com-prehensive repository, FakeHealth, which includes news con-tents with rich features, news reviews with detailed expla-nations, social engagements and a user-user social network.Moreover, exploratory analyses are conducted to understandthe characteristics of the datasets, analyze useful patterns andvalidate the quality of the datasets for health fake news detec-tion. We also discuss the novel and potential future researchdirections for the health fake news detection.  more » « less
Award ID(s):
1909702
PAR ID:
10167754
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the International AAAI Conference on Weblogs and Social Media
Volume:
14
ISSN:
2334-0770
Page Range / eLocation ID:
853-862
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Today social media has become the primary source for news. Via social media platforms, fake news travel at unprecedented speeds, reach global audiences and put users and communities at great risk. Therefore, it is extremely important to detect fake news as early as possible. Recently, deep learning based approaches have shown improved performance in fake news detection. However, the training of such models requires a large amount of labeled data, but manual annotation is time-consuming and expensive. Moreover, due to the dynamic nature of news, annotated samples may become outdated quickly and cannot represent the news articles on newly emerged events. Therefore, how to obtain fresh and high-quality labeled samples is the major challenge in employing deep learning models for fake news detection. In order to tackle this challenge, we propose a reinforced weakly-supervised fake news detection framework, i.e., WeFEND, which can leverage users' reports as weak supervision to enlarge the amount of training data for fake news detection. The proposed framework consists of three main components: the annotator, the reinforced selector and the fake news detector. The annotator can automatically assign weak labels for unlabeled news based on users' reports. The reinforced selector using reinforcement learning techniques chooses high-quality samples from the weakly labeled data and filters out those low-quality ones that may degrade the detector's prediction performance. The fake news detector aims to identify fake news based on the news content. We tested the proposed framework on a large collection of news articles published via WeChat official accounts and associated user reports. Extensive experiments on this dataset show that the proposed WeFEND model achieves the best performance compared with the state-of-the-art methods. 
    more » « less
  2. Agents with aberrant behavior are commonplace in today’s networks. There are fake profiles in social media, malicious websites on the internet, and fake news sources that are prolific in spreading misinformation. The distinguishing characteristic of networks with aberrant agents is that normal agents rarely link to aberrant ones. Based on this manifested behavior, we propose a directed Markov Random Field (MRF) formulation for detecting aberrant agents. The formulation balances two objectives: to have as few links as possible from normal to aberrant agents, as well as to deviate minimally from prior information (if given). The MRF formulation is solved optimally and efficiently. We compare the optimal solution for the MRF formulation to existing algorithms, including PageRank, TrustRank, and AntiTrustRank. To assess the performance of these algorithms, we present a variant of the modularity clustering metric that overcomes the known shortcomings of modularity in directed graphs. We show that this new metric has desirable properties and prove that optimizing it is NP-hard. In an empirical experiment with twenty-three different datasets, we demonstrate that the MRF method outperforms the other detection algorithms. 
    more » « less
  3. The spread of fake news related to COVID-19 is an infodemic that leads to a public health crisis. Therefore, detecting fake news is crucial for an effective management of the COVID-19 pandemic response. Studies have shown that machine learning models can detect COVID-19 fake news based on the content of news articles. However, the use of biomedical information, which is often featured in COVID-19 news, has not been explored in the development of these models. We present a novel approach for predicting COVID-19 fake news by leveraging biomedical information extraction (BioIE) in combination with machine learning models. We analyzed 1164 COVID-19 news articles and used advanced BioIE algorithms to extract 158 novel features. These features were then used to train 15 machine learning classifiers to predict COVID-19 fake news. Among the 15 classifiers, the random forest model achieved the best performance with an area under the ROC curve (AUC) of 0.882, which is 12.36% to 31.05% higher compared to models trained on traditional features. Furthermore, incorporating BioIE-based features improved the performance of a state-of-the-art multi-modality model (AUC 0.914 vs. 0.887). Our study suggests that incorporating biomedical information into fake news detection models improves their performance, and thus could be a valuable tool in the fight against the COVID-19 infodemic. 
    more » « less
  4. The spread of unwanted or malicious content through social me- dia has become a major challenge. Traditional examples of this include social network spam, but an important new concern is the propagation of fake news through social media. A common ap- proach for mitigating this problem is by using standard statistical classi cation to distinguish malicious (e.g., fake news) instances from benign (e.g., actual news stories). However, such an approach ignores the fact that malicious instances propagate through the network, which is consequential both in quantifying consequences (e.g., fake news di using through the network), and capturing de- tection redundancy (bad content can be detected at di erent nodes). An additional concern is evasion attacks, whereby the generators of malicious instances modify the nature of these to escape detection. We model this problem as a Stackelberg game between the defender who is choosing parameters of the detection model, and an attacker, who is choosing both the node at which to initiate malicious spread, and the nature of malicious entities. We develop a novel bi-level programming approach for this problem, as well as a novel solution approach based on implicit function gradients, and experimentally demonstrate the advantage of our approach over alternatives which ignore network structure. 
    more » « less
  5. Researchers across many disciplines seek to understand how misinformation spreads with a view toward limiting its impact. One important question in this research is how people determine whether a given piece of news is real or fake. In the current article, we discuss the value of signal detection theory (SDT) in disentangling two distinct aspects in the identification of fake news: (a) ability to accurately distinguish between real news and fake news and (b) response biases to judge news as real or fake regardless of news veracity. The value of SDT for understanding the determinants of fake-news beliefs is illustrated with reanalyses of existing data sets, providing more nuanced insights into how partisan bias, cognitive reflection, and prior exposure influence the identification of fake news. Implications of SDT for the use of source-related information in the identification of fake news, interventions to improve people’s skills in detecting fake news, and the debunking of misinformation are discussed. 
    more » « less