When one searches for political candidates on Google, a panel composed of recent news stories, known as Top stories, is commonly shown at the top of the search results page. These stories are selected by an algorithm that chooses from hundreds of thousands of articles published by thousands of news publishers. In our previous work, we identified 56 news sources that contributed 2/3 of all Top stories for 30 political candidates running in the primaries of 2020 US Presidential Election. In this paper, we survey US voters to elicit their familiarity and trust with these 56 news outlets. We find that some of the most frequent outlets are not familiar to all voters (e.g. The Hill or Politico), or particularly trusted by voters of any political stripes (e.g. Washington Examiner or The Daily Beast). Why then, are such sources shown so frequently in Top stories? We theorize that Google is sampling news articles from sources with different political leanings to offer a balanced coverage. This is reminiscent of the so-called “fairness doctrine” (1949-1987) policy in the United States that required broadcasters (radio or TV stations) to air contrasting views about controversial matters. Because there are fewer right-leaning publications than center or left-leaning ones, in order to maintain this “fair” balance, hyper-partisan far-right news sources of low trust receive more visibility than some news sources that are more familiar to and trusted by the public.
more »
« less
Manually Classified Real and Fake News Articles
News articles that are written with an intent to deliberately deceive or manipulate readers are inherently problematic. These so-called 'fake news' articles are believed to have contributed to election manipulation and even resulted in severe injury and death, by actions that they have triggered. Identifying intentionally deceptive and manipulative news article and alerting human readers is key to mitigating the damage that they can produce. The dataset presented in this paper includes manually identified and classified news stories that can be used for the training and testing of classification systems that identify legitimate versus fake and manipulative news stories. © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
more »
« less
- Award ID(s):
- 1757659
- PAR ID:
- 10156513
- Date Published:
- Journal Name:
- Proceedings of the 2019 International Conference on Computational Science and Computational Intelligence (CSCI)
- Page Range / eLocation ID:
- 1405 to 1407
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Machine Learning-Based Identifications of COVID-19 Fake News Using Biomedical Information ExtractionThe spread of fake news related to COVID-19 is an infodemic that leads to a public health crisis. Therefore, detecting fake news is crucial for an effective management of the COVID-19 pandemic response. Studies have shown that machine learning models can detect COVID-19 fake news based on the content of news articles. However, the use of biomedical information, which is often featured in COVID-19 news, has not been explored in the development of these models. We present a novel approach for predicting COVID-19 fake news by leveraging biomedical information extraction (BioIE) in combination with machine learning models. We analyzed 1164 COVID-19 news articles and used advanced BioIE algorithms to extract 158 novel features. These features were then used to train 15 machine learning classifiers to predict COVID-19 fake news. Among the 15 classifiers, the random forest model achieved the best performance with an area under the ROC curve (AUC) of 0.882, which is 12.36% to 31.05% higher compared to models trained on traditional features. Furthermore, incorporating BioIE-based features improved the performance of a state-of-the-art multi-modality model (AUC 0.914 vs. 0.887). Our study suggests that incorporating biomedical information into fake news detection models improves their performance, and thus could be a valuable tool in the fight against the COVID-19 infodemic.more » « less
-
null (Ed.)Today social media has become the primary source for news. Via social media platforms, fake news travel at unprecedented speeds, reach global audiences and put users and communities at great risk. Therefore, it is extremely important to detect fake news as early as possible. Recently, deep learning based approaches have shown improved performance in fake news detection. However, the training of such models requires a large amount of labeled data, but manual annotation is time-consuming and expensive. Moreover, due to the dynamic nature of news, annotated samples may become outdated quickly and cannot represent the news articles on newly emerged events. Therefore, how to obtain fresh and high-quality labeled samples is the major challenge in employing deep learning models for fake news detection. In order to tackle this challenge, we propose a reinforced weakly-supervised fake news detection framework, i.e., WeFEND, which can leverage users' reports as weak supervision to enlarge the amount of training data for fake news detection. The proposed framework consists of three main components: the annotator, the reinforced selector and the fake news detector. The annotator can automatically assign weak labels for unlabeled news based on users' reports. The reinforced selector using reinforcement learning techniques chooses high-quality samples from the weakly labeled data and filters out those low-quality ones that may degrade the detector's prediction performance. The fake news detector aims to identify fake news based on the news content. We tested the proposed framework on a large collection of news articles published via WeChat official accounts and associated user reports. Extensive experiments on this dataset show that the proposed WeFEND model achieves the best performance compared with the state-of-the-art methods.more » « less
-
The news arguably serves to inform the quantitative reasoning (QR) of news audiences. Before one can contemplate how well the news serves this function, we first need to determine how much QR typical news stories require from readers. This paper assesses the amount of quantitative content present in a wide array of media sources, and the types of QR required for audiences to make sense of the information presented. We build a corpus of 230 US news reports across four topic areas (health, science, economy, and politics) in February 2020. After classifying reports for QR required at both the conceptual and phrase levels, we find that the news stories in our sample can largely be classified along a single dimension: The amount of quantitative information they contain. There were two main types of quantitative clauses: those reporting on magnitude and those reporting on comparisons. While economy and health reporting required significantly more QR than science or politics reporting, we could not reliably differentiate the topic area based on story-level requirements for quantitative knowledge and clause-level quantitative content. Instead, we find three reliable clusters of stories based on the amounts and types of quantitative information in the news stories.more » « less
-
Fuzzing is the art of creating data and using that generated data as input for a target program. The goal behind this is to crash the program in a manner that can be analyzed and exploited. Software developers are able to benefit from fuzzers, as they can patch the discovered vulnerabilities before an attacker exploits them. Programs are becoming larger and require improved fuzzers to keep up with the increased attack surface. Most innovations in fuzzer development are software related and provide better path coverage or data generation. This paper proposes creating a fuzzer that is designed to utilize a dedicated graphics card's graphics processing unit (GPU) instead of the standard processor. Much of the code within the fuzzer is parallelizable, meaning the graphics card could potentially process it in a much more efficient manner. The effectiveness of GPU fuzzing is assessed herein. © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.more » « less
An official website of the United States government

