skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on March 12, 2026

Title: Boundaries of data journalism in U.S. public radio newsrooms
The recent addition of data journalists to several dozen U.S. public radio newsrooms has created multiple new hybridities in the form. No longer are numbers and large datasets “audio poison.” Instead, they are an essential tool for these journalists, who prize journalism’s interpretive function, expressing information in new ways and challenging conventions of broadcast newsroom employment. This study, which relies on semi-structured interviews with 13 public radio data journalists, uses Carlson’s boundary work typology to analyze the ways in which data journalists are expanding the boundaries of U.S. public radio journalism, as well as ways in which they have pushed back against expulsionary pressures. This study’s findings problematize the idea that the results of boundary work must be expressed as in-or-out proposition. Rather, U.S. public radio data journalists suggest their boundaries are a continuum where they may be conditionally accepted by their colleagues, depending on deadlines and on the skills possessed by non-data journalists.  more » « less
Award ID(s):
2129047
PAR ID:
10594471
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Sage
Date Published:
Journal Name:
Journalism
ISSN:
1464-8849
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Public records requests are a central mechanism for government transparency. In practice, they are slow, complex processes that require analyzing large amounts of messy, unstructured data. In this paper, we introduce RequestAtlas, a system that helps investigative journalists review large quantities of unstructured data that result from submitting many public records requests. RequestAtlas was developed through a year-long participatory design collaboration with the California Reporting Project (CRP), a journalistic collective researching police use of force and police misconduct in California. RequestAtlas helps journalists evaluate the results of public records requests for completeness and negotiate with agencies for additional information. RequestAtlas has had significant real-world impact. It has been deployed for more than a year to identify missing data in response to public records requests and to facilitate negotiation with public records request officers. Through the process of designing and observing the use of RequestAtlas, we explore the technical challenges associated with the public records request process and the design needs of investigative journalists more generally. We argue that public records requests represent an instance of an adversarialtechnical relationshipin which two entities engage in a prolonged, iterative, often adversarial exchange of information. Technologists can support information-gathering efforts within these adversarial technical relationships by building flexible local solutions that help both entities account for the state of the ongoing information exchange. Additionally, we offer insights on ways to design applications that can assist investigative journalists in the inevitably significant data cleaning phase of processing large documents while supporting journalistic norms of verification and human review. Finally, we reflect on the ways that this participatory design process, despite its success, lays bare some of the limitations inherent in the public records request process and in the ''request and respond'' model of transparency more generally. 
    more » « less
  2. String matching is at the core of data cleaning, record matching, and information retrieval. String matching relies on a similarity measure that evaluates the similarity of two strings, regarding the two as a match if their similarity is larger than a user-defined threshold. In our collaboration with journalists and public defenders, we found that real-world datasets, such as police rosters that journalists and public defenders work with, often contain acronyms, abbreviations, and typos, thanks to errors during manual entry, into, say, a spreadsheet or a form. Unfortunately, traditional similarity measures lead to low accuracy since they do not consider all three aspects together. Some recent work proposes leveraging synonym rules to improve matching, but either requires these rules to be provided upfront, or generated prior to matching, which leads to low accuracy in our setting and similar ones. To address these limitations, we propose Smash, a simple yet effective measure to assess the similarity of two strings with acronyms, abbreviations, and typos, all without relying on synonym rules. We design a dynamic programming algorithm to efficiently compute this measure, along with two optimizations that improve accuracy. We show that compared to the best baselines, including one based on ChatGPT with GPT-4, Smash improves the max and mean F-score by 23.5% and 110.8%, respectively. We implement Smash in OpenRefine, a graphical data cleaning tool, to facilitate its use by journalists, public defenders, and other non-programmers for data cleaning. 
    more » « less
  3. Government use of algorithmic decision-making (ADM) systems is widespread and diverse, and holding these increasingly high-impact, often opaque government algorithms accountable presents a number of challenges. Some European governments have launched registries of ADM systems used in public services, and some transparency initiatives exist for algorithms in specific areas of the United States government; however, the U.S. lacks an overarching registry that catalogs algorithms in use for public-service delivery throughout the government. This paper conducts an inductive thematic analysis of over 700 government ADM systems cataloged by the Algorithm Tips database in an effort to describe the various ways government algorithms might be understood and inform downstream uses of such an algorithmic catalog. We describe the challenge of government algorithm accountability, the Algorithm Tips database and method for conducting a thematic analysis, and the themes of topics and issues, levels of sophistication, interfaces, and utilities of U.S. government algorithms that emerge. Through these themes, we contribute several different descriptions of government algorithm use across the U.S. and at federal, state, and local levels which can inform stakeholders such as journalists, members of civil society, or government policymakers 
    more » « less
  4. We define big data as large amounts of information, collected about many people, over multiple devices. We define critical big data research as efforts to demonstrate how flaws — ethical or methodological — in the collection and use and of big have implications for social inequality. There are many critical and creative big data research endeavors around the world. Here we present an annotated catalog of projects that: are both critical and creative in their analysis of big data; have a distinct Principal Investigator (PI) or clear team; and, are producing an identifiable body of public essays, original research, or civic engagement projects. We have catalogued these endeavors with as much descriptive information as possible, and organized projects by the domains of big data critique and creativity in which they are having an impact. We identify some 35 distinct projects, and several dozen individual researchers, artists and civic leaders, operating in 16 domains of inquiry. We recommend expanding critical and creative work in several domains: expanding work in China; supporting policy initiatives in Latin America’s young democracies; expanding work on algorithmic manipulation originating in authoritarian countries; identifying best practices for how public agencies in the United States should develop big data initiatives. We recommend that the next stage of support for these lines of inquiry is to help publicize the output of these projects, many of which are of interest to a handful of specialists but should be made accessible to policy makers, journalists, and the interested public. 
    more » « less
  5. Chaudhuri, Kamalika; Jegelka, Stefanie; Song, Le; Szepesvari, Csaba; Niu, Gang; Sabato, Sivan (Ed.)
    Recent work has found that adversarially-robust deep networks used for image classification are more interpretable: their feature attributions tend to be sharper, and are more concentrated on the objects associated with the image’s ground- truth class. We show that smooth decision boundaries play an important role in this enhanced interpretability, as the model’s input gradients around data points will more closely align with boundaries’ normal vectors when they are smooth. Thus, because robust models have smoother boundaries, the results of gradient- based attribution methods, like Integrated Gradients and DeepLift, will capture more accurate information about nearby decision boundaries. This understanding of robust interpretability leads to our second contribution: boundary attributions, which aggregate information about the normal vectors of local decision bound- aries to explain a classification outcome. We show that by leveraging the key fac- tors underpinning robust interpretability, boundary attributions produce sharper, more concentrated visual explanations{—}even on non-robust models. 
    more » « less