skip to main content


Title: TROLLMAGNIFIER: Detecting State-Sponsored Troll Accounts on Reddit
Growing evidence points to recurring influence campaigns on social media, often sponsored by state actors aiming to manipulate public opinion on sensitive political topics. Typically, campaigns are performed through instrumented accounts, known as troll accounts; despite their prominence, however, little work has been done to detect these accounts in the wild. In this paper, we present TROLLMAGNIFIER, a detection system for troll accounts. Our key observation, based on analysis of known Russian sponsored troll accounts identified by Reddit, is that they show loose coordination, often interacting with each other to further specific narratives. Therefore, troll accounts controlled by the same actor often show similarities that can be leveraged for detection. TROLLMAGNIFIER learns the typical behavior of known troll accounts and identifies more that behave similarly. We train TROLLMAGNIFIER on a set of 335 known troll accounts and run it on a large dataset of Reddit accounts. Our system identifies 1,248 potential troll accounts; we then provide a multi-faceted analysis to corroborate the correctness of our classification. In particular, 66% of the detected accounts show signs of being instrumented by malicious actors (e.g., they were created on the same exact day as a known troll, they have since been suspended by Reddit, etc.). They also discuss similar topics as the known troll accounts and exhibit temporal synchronization in their activity. Overall, we show that using TROLLMAGNIFIER, one can grow the initial knowledge of potential trolls provided by Reddit by over 300%.  more » « less
Award ID(s):
2046590 2114411
NSF-PAR ID:
10323428
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the IEEE Symposium on Security and Privacy
ISSN:
2375-1207
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Past work has explored various ways for online platforms to leverage crowd wisdom for misinformation detection and moderation. Yet, platforms often relegate governance to their communities, and limited research has been done from the perspective of these communities and their moderators. How is misinformation currently moderated in online communities that are heavily self-governed? What role does the crowd play in this process, and how can this process be improved? In this study, we answer these questions through semi-structured interviews with Reddit moderators. We focus on a case study of COVID-19 misinformation. First, our analysis identifies a general moderation workflow model encompassing various processes participants use for handling COVID-19 misinformation. Further, we show that the moderation workflow revolves around three elements: content facticity, user intent, and perceived harm. Next, our interviews reveal that Reddit moderators rely on two types of crowd wisdom for misinformation detection. Almost all participants are heavily reliant on reports from crowds of ordinary users to identify potential misinformation. A second crowd--participants' own moderation teams and expert moderators of other communities--provide support when participants encounter difficult, ambiguous cases. Finally, we use design probes to better understand how different types of crowd signals---from ordinary users and moderators---readily available on Reddit can assist moderators with identifying misinformation. We observe that nearly half of all participants preferred these cues over labels from expert fact-checkers because these cues can help them discern user intent. Additionally, a quarter of the participants distrust professional fact-checkers, raising important concerns about misinformation moderation. 
    more » « less
  2. Social media has become an important method for information sharing. This has also created opportunities for bad actors to easily spread disinformation and manipulate public opinion. This paper explores the possibility of applying Authorship Verification on online communities to mitigate abuse by analyzing the writing style of online accounts to identify accounts managed by the same person. We expand on our similarity-based authorship verification approach, previously applied on large fanfictions, and show that it works in open-world settings, shorter documents, and is largely topic-agnostic. Our expanded model can link Reddit accounts based on the writing style of only 40 comments with an AUC of 0.95, and the performance increases to 0.98 given more content. We apply this model on a set of suspicious Reddit accounts associated with the disinformation campaign surrounding the 2016 U.S. presidential election and show that the writing style of these accounts are inconsistent, indicating that each account was likely maintained by multiple individuals. We also apply this model to Reddit user accounts that commented on the WallStreetBets subreddit around the 2021 GameStop short squeeze and show that a number of account pairs share very similar writing styles. We also show that this approach can link accounts across Reddit and Twitter with an AUC of 0.91 even when training data is very limited. 
    more » « less
  3. Haldorai, Anandakumar (Ed.)
    Cybersecurity affects us all in our daily lives. New knowledge on best practices, new vulnerabilities, and timely fixes for cybersecurity issues is growing super-linearly, and is spread across numerous, heterogeneous sources. Because of that, community contribution-based, question and answer sites have become clearinghouses for cybersecurity-related inquiries, as they have for many other topics. Historically, Stack Overflow has been the most popular platform for different kinds of technical questions, including for cybersecurity. That has been changing, however, with the advent of Security Stack Exchange, a site specifically designed for cybersecurity-related questions and answers. More recently, some cybersecurity-related subreddits of Reddit, have become hubs for cybersecurity-related questions and discussions. The availability of multiple overlapping communities has created a complex terrain to navigate for someone looking for an answer to a cybersecurity question. In this paper, we investigate how and why people choose among three prominent, overlapping, question and answer communities, for their cybersecurity knowledge needs. We aggregated data of several consecutive years of cybersecurity-related questions from Stack Overflow, Security Stack Exchange, and Reddit, and performed statistical, linguistic, and longitudinal analysis. To triangulate the results, we also conducted user surveys. We found that the user behavior across those three communities is different, in most cases. Likewise, cybersecurity-related questions asked on the three sites are different, more technical on Security Stack Exchange and Stack Overflow, and more subjective and personal on Reddit. Moreover, there appears to have been a differentiation of the communities along the same lines, accompanied by overall popularity trends suggestive of Stack Overflow’s decline and Security Stack Exchange’s rise within the cybersecurity community. Reddit is addressing the more subjective, discussion type needs of the lay community, and is growing rapidly. 
    more » « less
  4. Following the 2016 US elections Twitter launched their Information Operations (IO) hub where they archive account activity connected to state linked information operations. In June 2020, Twitter took down and released a set of accounts linked to Turkey's ruling political party (AKP). We investigate these accounts in the aftermath of the takedown to explore whether AKP-linked operations are ongoing and to understand the strategies they use to remain resilient to disruption. We collect live accounts that appear to be part of the same network, ~30% of which have been suspended by Twitter since our collection. We create a BERT-based classifier that shows similarity between these two networks, develop a taxonomy to categorize these accounts, find direct sequel accounts between the Turkish takedown and the live accounts, and find evidence that Turkish IO actors deliberately construct their network to withstand large-scale shutdown by utilizing explicit and implicit signals of coordination. We compare our findings from the Turkish operation to Russian and Chinese IO on Twitter and find that Turkey's IO utilizes a unique group structure to remain resilient. Our work highlights the fundamental imbalance between IO actors quickly and easily creating free accounts and the social media platforms spending significant resources on detection and removal, and contributes novel findings about Turkish IO on Twitter.

     
    more » « less
  5. Information manipulation is widespread in today’s media environment. Online networks have disrupted the gatekeeping role of traditional media by allowing various actors to influence the public agenda; they have also allowed automated accounts (or bots) to blend with human activity in the flow of information. Here, we assess the impact that bots had on the dissemination of content during two contentious political events that evolved in real time on social media. We focus on events of heightened political tension because they are particularly susceptible to information campaigns designed to mislead or exacerbate conflict. We compare the visibility of bots with human accounts, verified accounts, and mainstream news outlets. Our analyses combine millions of posts from a popular microblogging platform with web-tracking data collected from two different countries and timeframes. We employ tools from network science, natural language processing, and machine learning to analyze the diffusion structure, the content of the messages diffused, and the actors behind those messages as the political events unfolded. We show that verified accounts are significantly more visible than unverified bots in the coverage of the events but also that bots attract more attention than human accounts. Our findings highlight that social media and the web are very different news ecosystems in terms of prevalent news sources and that both humans and bots contribute to generate discrepancy in news visibility with their activity. 
    more » « less