skip to main content

Title: Toward Fairness in Misinformation Detection Algorithms
Misinformation in online spaces can stoke mistrust of established media, misinform the public and lead to radicalization. Hence, multiple automated algorithms for misinformation detection have been proposed in the recent past. However, the fairness (e.g., performance across left- and right- leaning news articles) of these algorithms has been repeatedly questioned, leading to decreased trust in such systems. This work motivates and grounds the need for an audit of machine learning based misinformation detection algorithms and possible ways to mitigate bias (if found). Using a large (N>100K) corpus of news articles, we report that multiple standard machine learning based misinformation detection approaches are susceptible to bias. Further, we find that an intuitive post-processing approach (Reject Option Classifier) can reduce bias while maintaining high accuracy in the above setting. The results pave the way for accurate yet fair misinformation detection algorithms.  more » « less
Award ID(s):
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Workshop Proceedings of the 16th International AAAI Conference on Web and Social Media
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Multiple recent efforts have used large-scale data and computational models to automatically detect misinformation in online news articles. Given the potential impact of misinformation on democracy, many of these efforts have also used the political ideology of these articles to better model misinformation and study political bias in such algorithms. However, almost all such efforts have used source level labels for credibility and political alignment, thereby assigning the same credibility and political alignment label to all articles from the same source (e.g., the New York Times or Breitbart). Here, we report on the impact of journalistic best practices to label individual news articles for their credibility and political alignment. We found that while source level labels are decent proxies for political alignment labeling, they are very poor proxies-almost the same as flipping a coin-for credibility ratings. Next, we study the implications of such source level labeling on downstream processes such as the development of automated misinformation detection algorithms and political fairness audits therein. We find that the automated misinformation detection and fairness algorithms can be suitably revised to support their intended goals but might require different assumptions and methods than those which are appropriate using source level labeling. The results suggest caution in generalizing recent results on misinformation detection and political bias therein. On a positive note, this work shares a new dataset of journalistic quality individually labeled articles and an approach for misinformation detection and fairness audits. 
    more » « less
  2. null (Ed.)
    Sentiment detection is an important building block for multiple information retrieval tasks such as product recommendation, cyberbullying, fake news and misinformation detection. Unsurprisingly, multiple commercial APIs, each with different levels of accuracy and fairness, are now publicly available for sentiment detection. Users can easily incorporate these APIs in their applications. While combining inputs from multiple modalities or black-box models for increasing accuracy is commonly studied in multimedia computing literature, there has been little work on combining different modalities for increasing fairness of the resulting decision. In this work, we audit multiple commercial sentiment detection APIs for the gender bias in two-actor news headlines settings and report on the level of bias observed. Next, we propose a "Flexible Fair Regression" approach, which ensures satisfactory accuracy and fairness by jointly learning from multiple black-box models. The results pave way for fair yet accurate sentiment detectors for multiple applications. 
    more » « less
  3. Many online news outlets, forums, and blogs provide a rich stream of publications and user comments. This rich body of data is a valuable source of information for researchers, journalists, and policymakers. However, the ever-increasing production and user engagement rate make it difficult to analyze this data without automated tools. This work presents MultiLayerET, a method to unify the representation of entities and topics in articles and comments. In MultiLayerET, articles' content and associated comments are parsed into a multilayer graph consisting of heterogeneous nodes representing named entities and news topics. The nodes within this graph have attributed edges denoting weight, i.e., the strength of the connection between the two nodes, time, i.e., the co-occurrence contemporaneity of two nodes, and sentiment, i.e., the opinion (in aggregate) of an entity toward a topic. Such information helps in analyzing articles and their comments. We infer the edges connecting two nodes using information mined from the textual data. The multilayer representation gives an advantage over a single-layer representation since it integrates articles and comments via shared topics and entities, providing richer signal points about emerging events. MultiLayerET can be applied to different downstream tasks, such as detecting media bias and misinformation. To explore the efficacy of the proposed method, we apply MultiLayerET to a body of data gathered from six representative online news outlets. We show that with MultiLayerET, the classification F1 score of a media bias prediction model improves by 36%, and that of a state-of-the-art fake news detection model improves by 4%. 
    more » « less
  4. Abstract

    Social media has been increasingly utilized to spread breaking news and risk communications during disasters of all magnitudes. Unfortunately, due to the unmoderated nature of social media platforms such as Twitter, rumors and misinformation are able to propagate widely. Given this, a surfeit of research has studied false rumor diffusion on Twitter, especially during natural disasters. Within this domain, studies have also focused on the misinformation control efforts from government organizations and other major agencies. A prodigious gap in research exists in studying the monitoring of misinformation on social media platforms in times of disasters and other crisis events. Such studies would offer organizations and agencies new tools and ideologies to monitor misinformation on platforms such as Twitter, and make informed decisions on whether or not to use their resources in order to debunk. In this work, we fill the research gap by developing a machine learning framework to predict the veracity of tweets that are spread during crisis events. The tweets are tracked based on the veracity of their content as either true, false, or neutral. We conduct four separate studies, and the results suggest that our framework is capable of tracking multiple cases of misinformation simultaneously, with scores exceeding 87%. In the case of tracking a single case of misinformation, our framework reaches an score of 83%. We collect and drive the algorithms with 15,952 misinformation‐related tweets from the Boston Marathon bombing (2013), Manchester Arena bombing (2017), Hurricane Harvey (2017), Hurricane Irma (2017), and the Hawaii ballistic missile false alert (2018). This article provides novel insights on how to efficiently monitor misinformation that is spread during disasters.

    more » « less
  5. Kidney trade has been on the rise despite the domestic and international law enforcement aiming to protect the vulnerable population from potential exploitation. Regional hubs are emerging in several parts of the world including South Asia, Central America, the Middle East and East Asia. Kidney trade networks reported in these hot spots are often complex systems involving several players such as buyers, sellers and surgery countries operating across international borders so that they can bypass domestic laws in sellers and buyers’ countries. The exact patterns of the country networks are, however, largely unknown due to the lack of a systematic approach to collect the data. Most of the kidney trade information is currently available in the form of case studies, court materials and news articles or reports, and no comprehensive database exists at this time. The present study thus explored online newspaper scraping to systematically collect 10 419 news articles from 24 major English newspapers in South Asia (January 2016 to May 2019) and build transnational kidney trade networks at the country level. Additionally, this study applied text mining techniques to extract words from each news article and developed machine learning algorithms to identify kidney trade and non-kidney trade news articles. Our findings suggest that online newspaper scraping coupled with the machine learning method is a promising approach to compile such data, especially in the dire shortage of empirical data. 
    more » « less