Raw datasets collected for fake news detection usually contain some noise such as missing values. In order to improve the performance of machine learning based fake news detection, a novel data preprocessing method is proposed in this paper to process the missing values. Specifically, we have successfully handled the missing values problem by using data imputation for both categorical and numerical features. For categorical features, we imputed missing values with the most frequent value in the columns. For numerical features, the mean value of the column is used to impute numerical missing values. In addition, TF-IDF vectorization is applied in feature extraction to filter out irrelevant features. Experimental results show that Multi-Layer Perceptron (MLP) classifier with the proposed data preprocessing method outperforms baselines and improves the prediction accuracy by more than 15%.
more »
« less
Machine Learning-Based Identifications of COVID-19 Fake News Using Biomedical Information Extraction
The spread of fake news related to COVID-19 is an infodemic that leads to a public health crisis. Therefore, detecting fake news is crucial for an effective management of the COVID-19 pandemic response. Studies have shown that machine learning models can detect COVID-19 fake news based on the content of news articles. However, the use of biomedical information, which is often featured in COVID-19 news, has not been explored in the development of these models. We present a novel approach for predicting COVID-19 fake news by leveraging biomedical information extraction (BioIE) in combination with machine learning models. We analyzed 1164 COVID-19 news articles and used advanced BioIE algorithms to extract 158 novel features. These features were then used to train 15 machine learning classifiers to predict COVID-19 fake news. Among the 15 classifiers, the random forest model achieved the best performance with an area under the ROC curve (AUC) of 0.882, which is 12.36% to 31.05% higher compared to models trained on traditional features. Furthermore, incorporating BioIE-based features improved the performance of a state-of-the-art multi-modality model (AUC 0.914 vs. 0.887). Our study suggests that incorporating biomedical information into fake news detection models improves their performance, and thus could be a valuable tool in the fight against the COVID-19 infodemic.
more »
« less
- Award ID(s):
- 1742517
- PAR ID:
- 10484386
- Publisher / Repository:
- MPDI
- Date Published:
- Journal Name:
- Big Data and Cognitive Computing
- Volume:
- 7
- Issue:
- 1
- ISSN:
- 2504-2289
- Page Range / eLocation ID:
- 46
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Today social media has become the primary source for news. Via social media platforms, fake news travel at unprecedented speeds, reach global audiences and put users and communities at great risk. Therefore, it is extremely important to detect fake news as early as possible. Recently, deep learning based approaches have shown improved performance in fake news detection. However, the training of such models requires a large amount of labeled data, but manual annotation is time-consuming and expensive. Moreover, due to the dynamic nature of news, annotated samples may become outdated quickly and cannot represent the news articles on newly emerged events. Therefore, how to obtain fresh and high-quality labeled samples is the major challenge in employing deep learning models for fake news detection. In order to tackle this challenge, we propose a reinforced weakly-supervised fake news detection framework, i.e., WeFEND, which can leverage users' reports as weak supervision to enlarge the amount of training data for fake news detection. The proposed framework consists of three main components: the annotator, the reinforced selector and the fake news detector. The annotator can automatically assign weak labels for unlabeled news based on users' reports. The reinforced selector using reinforcement learning techniques chooses high-quality samples from the weakly labeled data and filters out those low-quality ones that may degrade the detector's prediction performance. The fake news detector aims to identify fake news based on the news content. We tested the proposed framework on a large collection of news articles published via WeChat official accounts and associated user reports. Extensive experiments on this dataset show that the proposed WeFEND model achieves the best performance compared with the state-of-the-art methods.more » « less
-
null (Ed.)In times of uncertainty, people often seek out information to help alleviate fear, possibly leaving them vulnerable to false information. During the COVID-19 pandemic, we attended to a viral spread of incorrect and misleading information that compromised collective actions and public health measures to contain the spread of the disease. We investigated the influence of fear of COVID-19 on social and cognitive factors including believing in fake news, bullshit receptivity, overclaiming, and problem-solving—within two of the populations that have been severely hit by COVID-19: Italy and the United States of America. To gain a better understanding of the role of misinformation during the early height of the COVID-19 pandemic, we also investigated whether problem-solving ability and socio-cognitive polarization were associated with believing in fake news. Results showed that fear of COVID-19 is related to seeking out information about the virus and avoiding infection in the Italian and American samples, as well as a willingness to share real news (COVID and non-COVID-related) headlines in the American sample. However, fear positively correlated with bullshit receptivity, suggesting that the pandemic might have contributed to creating a situation where people were pushed toward pseudo-profound existential beliefs. Furthermore, problem-solving ability was associated with correctly discerning real or fake news, whereas socio-cognitive polarization was the strongest predictor of believing in fake news in both samples. From these results, we concluded that a construct reflecting cognitive rigidity, neglecting alternative information, and black-and-white thinking negatively predicts the ability to discern fake from real news. Such a construct extends also to reasoning processes based on thinking outside the box and considering alternative information such as problem-solving.more » « less
-
Gadekallu, Thippa Reddy (Ed.)As of March 30 2021, over 5,193 COVID-19 clinical trials have been registered through Clinicaltrial.gov. Among them, 191 trials were terminated, suspended, or withdrawn (indicating the cessation of the study). On the other hand, 909 trials have been completed (indicating the completion of the study). In this study, we propose to study underlying factors of COVID-19 trial completion vs . cessation, and design predictive models to accurately predict whether a COVID-19 trial may complete or cease in the future. We collect 4,441 COVID-19 trials from ClinicalTrial.gov to build a testbed, and design four types of features to characterize clinical trial administration, eligibility, study information, criteria, drug types, study keywords, as well as embedding features commonly used in the state-of-the-art machine learning. Our study shows that drug features and study keywords are most informative features, but all four types of features are essential for accurate trial prediction. By using predictive models, our approach achieves more than 0.87 AUC (Area Under the Curve) score and 0.81 balanced accuracy to correctly predict COVID-19 clinical trial completion vs . cessation. Our research shows that computational methods can deliver effective features to understand difference between completed vs . ceased COVID-19 trials. In addition, such models can also predict COVID-19 trial status with satisfactory accuracy, and help stakeholders better plan trials and minimize costs.more » « less
-
As the scourge of “fake news” continues to plague our information environment, attention has turned toward devising automated solutions for detecting problematic online content. But, in order to build reliable algorithms for flagging “fake news,” we will need to go beyond broad definitions of the concept and identify distinguishing features that are specific enough for machine learning. With this objective in mind, we conducted an explication of “fake news” that, as a concept, has ballooned to include more than simply false information, with partisans weaponizing it to cast aspersions on the veracity of claims made by those who are politically opposed to them. We identify seven different types of online content under the label of “fake news” (false news, polarized content, satire, misreporting, commentary, persuasive information, and citizen journalism) and contrast them with “real news” by introducing a taxonomy of operational indicators in four domains—message, source, structure, and network—that together can help disambiguate the nature of online news content.more » « less