skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: How ``Loco'' Is the LOCO Corpus? Annotating the Language of Conspiracy Theories
Conspiracy theories have found a new channel on the internet and spread by bringing together like-minded people, thus functioning as an echo chamber. The new 88-million word corpus \textit{Language of Conspiracy} (LOCO) was created with the intention to provide a text collection to study how the language of conspiracy differs from mainstream language. We use this corpus to develop a robust annotation scheme that will allow us to distinguish between documents containing conspiracy language and documents that do not contain any conspiracy content or that propagate conspiracy theories via misinformation (which we explicitly disregard in our work). We find that focusing on indicators of a belief in a conspiracy combined with textual cues of conspiracy language allows us to reach a substantial agreement (based on Fleiss{'} kappa and Krippendorff{'}s alpha). We also find that the automatic retrieval methods used to collect the corpus work well in finding mainstream documents, but include some documents in the conspiracy category that would not belong there based on our definition.  more » « less
Award ID(s):
2123618
PAR ID:
10412106
Author(s) / Creator(s):
; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. European Language Resources Association (Ed.)
    Conspiracy theories have found a new channel on the internet and spread by bringing together like-minded people, thus functioning as an echo chamber. The new 88-million word corpus Language of Conspiracy (LOCO) was created with the intention to provide a text collection to study how the language of conspiracy differs from mainstream language. We use this corpus to develop a robust annotation scheme that will allow us to distinguish between documents containing conspiracy language and documents that do not contain any conspiracy content or that propagate conspiracy theories via misinformation (which we explicitly disregard in our work). We find that focusing on indicators of a belief in a conspiracy combined with textual cues of conspiracy language allows us to reach a substantial agreement (based on Fleiss’ kappa and Krippendorff’s alpha). We also find that the automatic retrieval methods used to collect the corpus work well in finding mainstream documents, but include some documents in the conspiracy category that would not belong there based on our definition. 
    more » « less
  2. Abstract A conspiracy theory (CT) suggests covert groups or powerful individuals secretly manipulate events. Not knowing about existing conspiracy theories could make one more likely to believe them, so this work aims to compile a list of CTs shaped as a tree that is as comprehensive as possible. We began with a manually curated ‘tree’ of CTs from academic papers and Wikipedia. Next, we examined 1769 CT-related articles from four fact-checking websites, focusing on their core content, and used a technique called Keyphrase Extraction to label the documents. This process yielded 769 identified conspiracies, each assigned a label and a family name. The second goal of this project was to detect whether an article is a conspiracy theory, so we built a binary classifier with our labeled dataset. This model uses a transformer-based machine learning technique and is pre-trained on a large corpus called RoBERTa, resulting in an F1 score of 87%. This model helps to identify potential conspiracy theories in new articles. We used a combination of clustering (HDBSCAN) and a dimension reduction technique (UMAP) to assign a label from the tree to these new articles detected as conspiracy theories. We then labeled these groups accordingly to help us match them to the tree. These can lead us to detect new conspiracy theories and expand the tree using computational methods. We successfully generated a tree of conspiracy theories and built a pipeline to detect and categorize conspiracy theories within any text corpora. This pipeline gives us valuable insights through any databases formatted as text. 
    more » « less
  3. Online discussions frequently involve conspiracy theories, which can contribute to the proliferation of belief in them. However, not all discussions surrounding conspiracy theories promote them, as some are intended to debunk them. Existing research has relied on simple proxies or focused on a constrained set of signals to identify conspiracy theories, which limits our understanding of conspiratorial discussions across different topics and online communities. This work establishes a general scheme for classifying discussions related to conspiracy theories based on authors' perspectives on the conspiracy belief, which can be expressed explicitly through narrative elements, such as the agent, action, or objective, or implicitly through references to known theories, such as chemtrails or the New World Order. We leverage human-labeled ground truth to train a BERT-based model for classifying online CTs, which we then compared to the Generative Pre-trained Transformer machine (GPT) for detecting online conspiratorial content. Despite GPT's known strengths in its expressiveness and contextual understanding, our study revealed significant flaws in its logical reasoning, while also demonstrating comparable strengths from our classifiers. We present the first large-scale classification study using posts from the most active conspiracy-related Reddit forums and find that only one-third of the posts are classified as positive. This research sheds light on the potential applications of large language models in tasks demanding nuanced contextual comprehension. 
    more » « less
  4. Recent research on conspiracy theories labels conspiracism as a distinct and deficient epistemic process. However, the tendency to pathologize conspiracism obscures the fact that it is a diverse and dynamic collective sensemaking process, transacted in public on the web. Here, we adopt a narrative framework to introduce a new analytical approach for examining online conspiracism. Narrative plays an important role because it is central to human cognition as well as being domain agnostic, and so can serve as a bridge between conspiracism and other modes of knowledge production. To illustrate the utility of our approach, we use it to analyze conspiracy theories identified in conversations across three different anti-vaccination discussion forums. Our approach enables us to capture more abstract categories without hiding the underlying diversity of the raw data. We find that there are dominant narrative themes across sites, but that there is also a tremendous amount of diversity within these themes. Our initial observations raise the possibility that different communities play different roles in the collective construction of conspiracy theories online. This offers one potential route for understanding not only cross-sectional differentiation, but the longitudinal dynamics of the narrative in future work. In particular, we are interested to examine how activity within the framework of the narrative shifts in response to news events and social media platforms’ nascent efforts to control different types of misinformation. Such analysis will help us to better understand how collectively constructed conspiracy narratives adapt in a shifting media ecosystem. 
    more » « less
  5. Online communities play a crucial role in disseminating conspiracy theories. New theories often emerge in the aftermath of catastrophic events. Despite evidence of their widespread appeal, surprisingly little is known about who participates in these event-specific conspiratorial discussions or how do these discussions evolve over time. We study r/conspiracy, an active Reddit community of more than 200,000 users dedicated to conspiratorial discussions. By focusing on four tragic events and 10 years of discussions, we find three distinct user cohorts: joiners, who never participated in Reddit but joined r/conspiracy only after the event; converts who were active Reddit users but joined r/conspiracy only after the event; and veterans, who are longstanding r/conspiracy members. While joiners and converts have a shorter lifespan in the community in comparison to the veterans, joiners are more active during their shorter tenure, becoming increasingly engaged over time. Finally, to investigate how these events affect users’ conspiratorial discussions, we adopted a causal inference approach to analyze user comments around the time of the events. We find that discussions happening after the event exhibit signs of emotional shock, increased language complexity, and simultaneous expressions of certainty and doubtfulness. Our work provides insight on how online communities may detect new conspiracy theories that emerge ensuing dramatic events, and in the process stop them before they spread. 
    more » « less