skip to main content


Search for: All records

Award ID contains: 2027750

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. With the spread of the SARS-CoV-2, enormous amounts of information about the pandemic are disseminated through social media platforms such as Twitter. Social media posts often leverage the trust readers have in prestigious news agencies and cite news articles as a way of gaining credibility. Nevertheless, it is not always the case that the cited article supports the claim made in the social media post. We present a cross-genre ad hoc pipeline to identify whether the information in a Twitter post (i.e., a “Tweet”) is indeed supported by the cited news article. Our approach is empirically based on a corpus of over 46.86 million Tweets and is divided into two tasks: (i) development of models to detect Tweets containing claim and worth to be fact-checked and (ii) verifying whether the claims made in a Tweet are supported by the newswire article it cites. Unlike previous studies that detect unsubstantiated information by post hoc analysis of the patterns of propagation, we seek to identify reliable support (or the lack of it) before the misinformation begins to spread. We discover that nearly half of the Tweets (43.4%) are not factual and hence not worth checking – a significant filter, given the sheer volume of social media posts on a platform such as Twitter. Moreover, we find that among the Tweets that contain a seemingly factual claim while citing a news article as supporting evidence, at least 1% are not actually supported by the cited news, and are hence misleading. 
    more » « less
  2. Pirk, Holger ; Heinis, Thomas (Ed.)
    Organizations collect data from various sources, and these datasets may have characteristics that are unknown. Selecting the appropriate statistical and machine learning algorithm for data analytical purposes benefits from understanding these characteristics, such as if it contains temporal attributes or not. This paper presents a theoretical basis for automatically determining the presence of temporal data in a dataset given no prior knowledge about its attributes. We use a method to classify an attribute as temporal, non-temporal, or hidden temporal. A hidden (grouping) temporal attribute can only be treated as temporal if its values are categorized in groups. Our method uses a Ljung-Box test for autocorrelation as well as a set of metrics we proposed based on the classification statistics. Our approach detects all temporal and hidden temporal attributes in 15 datasets from various domains. 
    more » « less
  3. null (Ed.)