skip to main content


Title: A Hybrid Recognition System for Check-worthy Claims Using Heuristics and Supervised Learning
In recent years, the speed at which information disseminates has received an alarming boost from the pervasive usage of social media. To the detriment of political and social stability, this has also made it easier to quickly spread false claims. Due to the sheer volume of information, manual fact-checking seems infeasible, and as a result, computational approaches have been recently explored for automated fact-checking. In spite of the recent advancements in this direction, the critical step of recognizing and prioritizing statements worth fact-checking has received little attention. In this paper, we propose a hybrid approach that combines simple heuristics with supervised machine learning to identify claims made in political debates and speeches, and provide a mechanism to rank them in terms of their "check-worthiness". The viability of our method is demonstrated by evaluations on the English language dataset as part of the Check-worthiness task of the CLEF-2018 Fact Checking Lab.  more » « less
Award ID(s):
1834597
NSF-PAR ID:
10162052
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
CEUR workshop proceedings
Volume:
2125
ISSN:
1613-0073
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. As the spread of information has received a compelling boost due to pervasive use of social media, so has the spread of misinformation. The sheer volume of data has rendered the traditional methods of expert-driven manual fact-checking largely infeasible. As a result, computational linguistics and data-driven algorithms have been explored in recent years. Despite this progress, identifying and prioritizing what needs to be checked has received little attention. Given that expert-driven manual intervention is likely to remain an important component of fact-checking, especially in specific domains (e.g., politics, environmental science), this identification and prioritization is critical. A successful algorithmic ranking of “check-worthy” claims can help an expert-in-the-loop fact-checking system, thereby reducing the expert’s workload while still tackling the most salient bits of misinformation. In this work, we explore how linguistic syntax, semantics, and the contextual meaning of words play a role in determining the check-worthiness of claims. Our preliminary experiments used explicit stylometric features and simple word embeddings on the English language dataset in the Check-worthiness task of the CLEF-2018 Fact-Checking Lab, where our primary solution outperformed the other systems in terms of the mean average precision, R-precision, reciprocal rank, and precision at k for multiple values k. Here, we present an extension of this approach with more sophisticated word embeddings and report further improvements in this task. 
    more » « less
  2. Verifying political claims is a challenging task, as politicians can use various tactics to subtly misrepresent the facts for their agenda. Existing automatic fact-checking systems fall short here, and their predictions like "half-true" are not very useful in isolation, since it is unclear which parts of a claim are true or false. In this work, we focus on decomposing a complex claim into a comprehensive set of yes-no subquestions whose answers influence the veracity of the claim. We present CLAIMDECOMP, a dataset of decompositions for over 1000 claims. Given a claim and its verification paragraph written by fact-checkers, our trained annotators write subquestions covering both explicit propositions of the original claim and its implicit facets, such as additional political context that changes our view of the claim's veracity. We study whether state-of-the-art pre-trained models can learn to generate such subquestions. Our experiments show that these models generate reasonable questions, but predicting implied subquestions based only on the claim (without consulting other evidence) remains challenging. Nevertheless, we show that predicted subquestions can help identify relevant evidence to fact-check the full claim and derive the veracity through their answers, suggesting that claim decomposition can be a useful piece of a fact-checking pipeline. 
    more » « less
  3. With the spread of the SARS-CoV-2, enormous amounts of information about the pandemic are disseminated through social media platforms such as Twitter. Social media posts often leverage the trust readers have in prestigious news agencies and cite news articles as a way of gaining credibility. Nevertheless, it is not always the case that the cited article supports the claim made in the social media post. We present a cross-genre ad hoc pipeline to identify whether the information in a Twitter post (i.e., a “Tweet”) is indeed supported by the cited news article. Our approach is empirically based on a corpus of over 46.86 million Tweets and is divided into two tasks: (i) development of models to detect Tweets containing claim and worth to be fact-checked and (ii) verifying whether the claims made in a Tweet are supported by the newswire article it cites. Unlike previous studies that detect unsubstantiated information by post hoc analysis of the patterns of propagation, we seek to identify reliable support (or the lack of it) before the misinformation begins to spread. We discover that nearly half of the Tweets (43.4%) are not factual and hence not worth checking – a significant filter, given the sheer volume of social media posts on a platform such as Twitter. Moreover, we find that among the Tweets that contain a seemingly factual claim while citing a news article as supporting evidence, at least 1% are not actually supported by the cited news, and are hence misleading. 
    more » « less
  4. Most existing methods for automatic fact-checking start with a precompiled list of claims to verify. We investigate the understudied problem of determining what statements in news articles are worthy to fact-check. We annotate the argument structure of 95 news articles in the climate change domain that are fact-checked by climate scientists at climatefeedback.org. We release the first multi-layer annotated corpus for both argumentative discourse structure (argument types and relations) and for fact-checked statements in news articles. We discuss the connection between argument structure and check-worthy statements and develop several baseline models for detecting check-worthy statements in the climate change domain. Our preliminary results show that using information about argumentative discourse structure shows slight but statistically significant improvement over a baseline of local discourse structure. 
    more » « less
  5. De Cristofaro, Emiliano ; Nakov, Preslav (Ed.)
    Google’s reviewed claims feature was an early attempt to incorporate additional credibility signals from fact-checking onto the search results page. The feature, which appeared when users searched for the name of a subset of news publishers, was criticized by dozens of publishers for its errors and alleged anticonservative bias. By conducting an audit of news publisher search results and focusing on the critiques of publishers, we find that there is a lack of consensus between fact-checking ecosystem stakeholders that may be important to address in future iterations of public facing fact-checking tools. In particular, we find that a lack of transparency coupled with a lack of consensus on what makes a fact-check relevant to a news article led to the breakdown of reviewed claims. 
    more » « less