skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Greenstadt, Rachel"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Recently, there have been significant advances and wide-scale use of generative AI in natural language generation. Models such as OpenAI’s GPT3 and Meta’s LLaMA are widely used in chatbots, to summarize documents, and to generate creative content. These advances raise concerns about abuses of these models, especially in social media settings, such as large-scale generation of disinformation, manipulation campaigns that use AI-generated content, and personalized scams. We used stylometry (the analysis of style in natural language text) to analyze the style of AI-generated text. Specifically, we applied an existing authorship verification (AV) model that can predict if two documents are written by the same author on texts generated by GPT2, GPT3, ChatGPT and LLaMA. Our AV model was trained only on human-written text and was effectively used in social media settings to analyze cases of abuse. We generated texts by providing the language models with fanfiction snippets and prompting them to complete the rest of it in the same writing style as the original snippet. We then applied the AV model across the texts generated by the language models and the human written texts to analyze the similarity of the writing styles between these texts. We found that texts generated with GPT2 had the highest similarity to the human texts. Texts generated by GPT3 and ChatGPT were very different from the human snippet, and were similar to each other. LLaMA-generated texts had some similarity to the original snippet but also has similarities with other LLaMA-generated texts and texts from other models. We then conducted a feature analysis to identify the features that drive these similarity scores. This analysis helped us answer questions like which features distinguish the language style of language models and humans, which features are different across different models, and how these linguistic features change over different language model versions. The dataset and the source code used in this analysis have been made public to allow for further analysis of new language models. 
    more » « less
    Free, publicly-accessible full text available February 25, 2026
  2. Content moderation practices and technologies need to change over time as requirements and community expectations shift. However, attempts to restructure the existing moderation practices can be difficult, especially for platforms that rely on their communities to moderate, because changes can transform the workflow and workload participants' reward systems. By examining the extensive archival discussions around a prepublication moderation technology on Wikipedia named Flagged Revisions, complemented by seven semi-structured interviews, we identify various challenges in restructuring community-based moderation practices. Thus, we find that while a new system might sound good in theory and perform well in terms of quantitative metrics, it may conflict with existing social norms. Furthermore, our findings underscore how the relationship between platforms and self-governed communities can hinder the ability to assess the performance of any new system and introduce considerable costs related to maintaining, overhauling, or scrapping any piece of infrastructure. 
    more » « less
    Free, publicly-accessible full text available November 7, 2025
  3. People Search Websites, a category of data brokers, collect, catalog, monetize and often publicly display individuals' personally identifiable information (PII). We present a study of user privacy rights in 20 such websites assessing the usability of data access and data removal mechanisms. We combine insights from these two processes to determine connections between sites, such as shared access mechanisms or removal effects. We find that data access requests are mostly unsuccessful. Instead, sites cite a variety of legal exceptions or misinterpret the nature of the requests. By purchasing reports, we find that only one set of connected sites provided access to the same report they sell to customers. We leverage a multiple step removal process to investigate removal effects between suspected connected sites. In general, data removal is more streamlined than data access, but not very transparent; questions about the scope of removal and reappearance of information remain. Confirming and expanding the connections observed in prior phases, we find that four main groups are behind 14 of the sites studied, indicating the need to further catalog these connections to simplify removal. 
    more » « less
  4. Online harassment remains a prevalent problem for internet users. Its impact is made orders of magnitude worse when multiple harassers coordinate to conduct networked attacks. This paper presents an analysis of 231 threads in Kiwi Farms, a notorious online harassment community. We find that networked online harassment campaigns consists of three phases: target introduction, network decision, and network response. The first stage consists of the initial narrative elements, that are approved or not in stage two and expanded in stage three. Narrative building is a common element of all three stages. The network plays a key role in narrative building, adding elements to the narrative in at least 80 % of the threads, resulting in sustained harassment. This finding is central to our model of Continuous Narrative Escalation (CNE), that has two parts: (1) narrative continuation, the action of repeatedly adding new information to the existing narrative and (2) escalation, the aggravation of harassment that occurs as a consequence. In addition, we present insights from our analysis of 100 takedown requests threads, discussing received abuse reports. We find that these takedown requests are misused by the community and are used as elements to further fuel the narrative. We use our findings and framework to come up with a set of recommendations, that can inform harassment interventions and make online spaces safer. 
    more » « less
  5. Deepfakes have become a dual-use technology with applications in the domains of art, science, and industry. However, the technology can also be leveraged maliciously in areas such as disinformation, identity fraud, and harassment. In response to the technology's dangerous potential many deepfake creation communities have been deplatformed, including the technology's originating community – r/deepfakes. Opening in February 2018, just eight days after the removal of r/deepfakes, MrDeepFakes (MDF) went online as a privately owned platform to fulfill the role of community hub, and has since grown into the largest dedicated deepfake creation and discussion platform currently online. This position of community hub is balanced against the site's other main purpose, which is the hosting of deepfake pornography depicting public figures- produced without consent. In this paper we explore the two largest deepfake communities that have existed via a mixed methods approach utilizing quantitative and qualitative analysis. We seek to identify how these platforms were and are used by their members, what opinions these deepfakers hold about the technology and how it is seen by society at large, and identify how deepfakes-as-disinformation is viewed by the community. We find that there is a large emphasis on technical discussion on these platforms, intermixed with potentially malicious content. Additionally, we find the deplatforming of deepfake communities early in the technology's life has significantly impacted trust regarding alternative community platforms. 
    more » « less
  6. This qualitative study examines the privacy challenges perceived by librarians who afford access to physical and electronic spaces and are in a unique position of safeguarding the privacy of their patrons. As internet “service providers,” librarians represent a bridge between the physical and internet world, and thus offer a unique sight line to the convergence of privacy, identity, and social disadvantage. Drawing on interviews with 16 librarians, we describe how they often interpret or define their own rules when it comes to privacy to protect patrons who face challenges that stem from structures of inequality outside their walls. We adopt the term “intersectional thinking” to describe how librarians reported thinking about privacy solutions, which is focused on identity and threats of structural discrimination (the rules, norms, and other determinants of discrimination embedded in institutions and other societal structures that present barriers to certain groups or individuals), and we examine the role that low/no-tech strategies play in ameliorating these threats. We then discuss how librarians act as privacy intermediaries for patrons, the potential analogue for this role for developers of systems, the power of low/no-tech strategies, and implications for design and research of privacy-enhancing technologies (PETs). 
    more » « less
  7. People Search Websites aggregate and publicize users’ Personal Identifiable Information (PII), previously sourced from data brokers. This paper presents a qualitative study of the perceptions and experiences of 18 participants who sought information removal by hiring a removal service or requesting removal from the sites. The users we interviewed were highly motivated and had sophisticated risk perceptions. We found that they encountered obstacles during the removal process, resulting in a high cost of removal, whether they requested it themselves or hired a service. Participants perceived that the successful monetization of users PII motivates data aggregators to make the removal more difficult. Overall, self management of privacy by attempting to keep information off the internet is difficult and its’ success is hard to evaluate. We provide recommendations to users, third parties, removal services and researchers aiming to improve the removal process. 
    more » « less
  8. Many online communities rely on postpublication moderation where contributors-even those that are perceived as being risky-are allowed to publish material immediately and where moderation takes place after the fact. An alternative arrangement involves moderating content before publication. A range of communities have argued against prepublication moderation by suggesting that it makes contributing less enjoyable for new members and that it will distract established community members with extra moderation work. We present an empirical analysis of the effects of a prepublication moderation system called FlaggedRevs that was deployed by several Wikipedia language editions. We used panel data from 17 large Wikipedia editions to test a series of hypotheses related to the effect of the system on activity levels and contribution quality. We found that the system was very effective at keeping low-quality contributions from ever becoming visible. Although there is some evidence that the system discouraged participation among users without accounts, our analysis suggests that the system's effects on contribution volume and quality were moderate at most. Our findings imply that concerns regarding the major negative effects of prepublication moderation systems on contribution quality and project productivity may be overstated. 
    more » « less