skip to main content


Title: Detection of Fraudulent Tweets: An Empirical Investigation Using Network Analysis and Deep Learning Technique
Social media has become a powerful and efficient platform for information diffusion. The increasing pervasiveness of social media use, however, has brought about the problems of fraudulent accounts that are intended to diffuse misinformation or malicious contents. Twitter recently released comprehensive archives of fraudulent tweets that are possibly connected to a propaganda effort of Internet Research Agency (IRA) on the 2016 U.S. presidential election. To understand information diffusion in fraudulent networks, we analyze structural properties of the IRA retweet network, and develop deep neural network models to detect fraudulent tweets. The structure analysis reveals key characteristics of the fraudulent network. The experiment results demonstrate the superior performance of the deep learning technique to a traditional classification method in detecting fraudulent tweets. The findings have potential implications for curbing online misinformation.  more » « less
Award ID(s):
1912898
NSF-PAR ID:
10095445
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
IEEE International Conference on Intelligence and Security Informatics
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    Social media has been increasingly utilized to spread breaking news and risk communications during disasters of all magnitudes. Unfortunately, due to the unmoderated nature of social media platforms such as Twitter, rumors and misinformation are able to propagate widely. Given this, a surfeit of research has studied false rumor diffusion on Twitter, especially during natural disasters. Within this domain, studies have also focused on the misinformation control efforts from government organizations and other major agencies. A prodigious gap in research exists in studying the monitoring of misinformation on social media platforms in times of disasters and other crisis events. Such studies would offer organizations and agencies new tools and ideologies to monitor misinformation on platforms such as Twitter, and make informed decisions on whether or not to use their resources in order to debunk. In this work, we fill the research gap by developing a machine learning framework to predict the veracity of tweets that are spread during crisis events. The tweets are tracked based on the veracity of their content as either true, false, or neutral. We conduct four separate studies, and the results suggest that our framework is capable of tracking multiple cases of misinformation simultaneously, with scores exceeding 87%. In the case of tracking a single case of misinformation, our framework reaches an score of 83%. We collect and drive the algorithms with 15,952 misinformation‐related tweets from the Boston Marathon bombing (2013), Manchester Arena bombing (2017), Hurricane Harvey (2017), Hurricane Irma (2017), and the Hawaii ballistic missile false alert (2018). This article provides novel insights on how to efficiently monitor misinformation that is spread during disasters.

     
    more » « less
  2. null (Ed.)
    During COVID-19, misinformation on social media affects the adoption of appropriate prevention behaviors. It is urgent to suppress the misinformation to prevent negative public health consequences. Although an array of studies has proposed misinformation suppression strategies, few have investigated the role of predominant credible information during crises. None has examined its effect quantitatively using longitudinal social media data. Therefore, this research investigates the temporal correlations between credible information and misinformation, and whether predominant credible information can suppress misinformation for two prevention measures (i.e. topics), i.e. wearing masks and social distancing using tweets collected from February 15 to June 30, 2020. We trained Support Vector Machine classifiers to retrieve relevant tweets and classify tweets containing credible information and misinformation for each topic. Based on cross-correlation analyses of credible and misinformation time series for both topics, we find that the previously predominant credible information can lead to the decrease of misinformation (i.e. suppression) with a time lag. The research findings provide empirical evidence for suppressing misinformation with credible information in complex online environments and suggest practical strategies for future information management during crises and emergencies. 
    more » « less
  3. Social media is being increasingly utilized to spread breaking news and updates during disasters of all magnitudes. Unfortunately, due to the unmoderated nature of social media platforms such as Twitter, rumors and misinformation are able to propagate widely. Given this, a surfeit of research has studied rumor diffusion on social media, especially during natural disasters. In many studies, researchers manually code social media data to further analyze the patterns and diffusion dynamics of users and misinformation. This method requires many human hours, and is prone to significant incorrect classifications if the work is not checked over by another individual. In our studies, we fill the research gap by applying seven different machine learning algorithms to automatically classify misinformed Twitter data that is spread during disaster events. Due to the unbalanced nature of the data, three different balancing algorithms are also applied and compared. We collect and drive the classifiers with data from the Manchester Arena bombing (2017), Hurricane Harvey (2017), the Hawaiian incoming missile alert (2018), and the East Coast US tsunami alert (2018). Over 20,000 tweets are classified based on the veracity of their content as either true, false, or neutral, with overall accuracies exceeding 89%. 
    more » « less
  4. This paper introduces and presents a first analysis of a uniquely curated dataset of misinformation, disinformation, and rumors spreading on Twitter about the 2020 U.S. election. Previous research on misinformation—an umbrella term for false and misleading content—has largely focused either on broad categories, using a finite set of keywords to cover a complex topic, or on a few, focused case studies, with increased precision but limited scope. Our approach, by comparison, leverages real-time reports collected from September through November 2020 to develop a comprehensive dataset of tweets connected to 456 distinct misinformation stories from the 2020 U.S. election (our ElectionMisinfo2020 dataset), 307 of which sowed doubt in the legitimacy of the election. By relying on real-time incidents and streaming data, we generate a curated dataset that not only provides more granularity than a large collection based on a finite number of search terms, but also an improved opportunity for generalization compared to a small set of case studies. Though the emphasis is on misleading content, not all of the tweets linked to a misinformation story are false: some are questions, opinions, corrections, or factual content that nonetheless contributes to misperceptions. Along with a detailed description of the data, this paper provides an analysis of a critical subset of election-delegitimizing misinformation in terms of size, content, temporal diffusion, and partisanship. We label key ideological clusters of accounts within interaction networks, describe common misinformation narratives, and identify those accounts which repeatedly spread misinformation. We document the asymmetry of misinformation spread: accounts associated with support for President Biden shared stories in ElectionMisinfo2020 far less than accounts supporting his opponent. That asymmetry remained among the accounts who were repeatedly influential in the spread of misleading content that sowed doubt in the election: all but two of the top 100 ‘repeat spreader’ accounts were supporters of then-President Trump. These findings support the implementation and enforcement of ‘strike rules’ on social media platforms, directly addressing the outsized role of repeat spreaders. 
    more » « less
  5. With the spread of the SARS-CoV-2, enormous amounts of information about the pandemic are disseminated through social media platforms such as Twitter. Social media posts often leverage the trust readers have in prestigious news agencies and cite news articles as a way of gaining credibility. Nevertheless, it is not always the case that the cited article supports the claim made in the social media post. We present a cross-genre ad hoc pipeline to identify whether the information in a Twitter post (i.e., a “Tweet”) is indeed supported by the cited news article. Our approach is empirically based on a corpus of over 46.86 million Tweets and is divided into two tasks: (i) development of models to detect Tweets containing claim and worth to be fact-checked and (ii) verifying whether the claims made in a Tweet are supported by the newswire article it cites. Unlike previous studies that detect unsubstantiated information by post hoc analysis of the patterns of propagation, we seek to identify reliable support (or the lack of it) before the misinformation begins to spread. We discover that nearly half of the Tweets (43.4%) are not factual and hence not worth checking – a significant filter, given the sheer volume of social media posts on a platform such as Twitter. Moreover, we find that among the Tweets that contain a seemingly factual claim while citing a news article as supporting evidence, at least 1% are not actually supported by the cited news, and are hence misleading. 
    more » « less