Investigative data journalists work with a variety of data sources to tell a story. Though prior work has indicated that there is a close relationship between journalists' data work practices and that of data scientists. However, these relationships and data work practices are not empirically examined, and understanding them is crucial to inform the design of tools that are used by different groups of people including data scientists and data journalists. Thus, to bridge this gap, we studied investigative reporters' data work practices with one non-profit investigative newsroom. Our study design includes two activities: 1) semi-structured interviews with journalists, and 2) a sketching activity allowing journalists to depict examples of their work practices. By analyzing these data and synthesizing them across related prior work, we propose the major phases in the data-driven investigative journalism story idea generation process. Our study findings show that the journalists employ a collection of multiple, iterative, cyclic processes to identify journalistically "interesting'' story ideas. These processes both significantly resemble and show subtle nuanced differences with data science work practices identified in prior research. We further verified our proposal through a member check with key informants. This work offers three primary contributions. First, it provides a close glimpse into the main phases of investigative journalists' data-driven story idea generation technique. Second, it complements prior work studying formal data science practices by examining data-driven investigative journalists, whose primary expertise lies outside computing. Third, it identifies particular points in the data exploration processes that would benefit from design interventions and suggests future research directions.
more »
« less
This content will become publicly available on May 2, 2026
RequestAtlas: Supporting the Slow and Iterative Process of Requesting Public Records
Public records requests are a central mechanism for government transparency. In practice, they are slow, complex processes that require analyzing large amounts of messy, unstructured data. In this paper, we introduce RequestAtlas, a system that helps investigative journalists review large quantities of unstructured data that result from submitting many public records requests. RequestAtlas was developed through a year-long participatory design collaboration with the California Reporting Project (CRP), a journalistic collective researching police use of force and police misconduct in California. RequestAtlas helps journalists evaluate the results of public records requests for completeness and negotiate with agencies for additional information. RequestAtlas has had significant real-world impact. It has been deployed for more than a year to identify missing data in response to public records requests and to facilitate negotiation with public records request officers. Through the process of designing and observing the use of RequestAtlas, we explore the technical challenges associated with the public records request process and the design needs of investigative journalists more generally. We argue that public records requests represent an instance of an adversarialtechnical relationshipin which two entities engage in a prolonged, iterative, often adversarial exchange of information. Technologists can support information-gathering efforts within these adversarial technical relationships by building flexible local solutions that help both entities account for the state of the ongoing information exchange. Additionally, we offer insights on ways to design applications that can assist investigative journalists in the inevitably significant data cleaning phase of processing large documents while supporting journalistic norms of verification and human review. Finally, we reflect on the ways that this participatory design process, despite its success, lays bare some of the limitations inherent in the public records request process and in the ''request and respond'' model of transparency more generally.
more »
« less
- Award ID(s):
- 2243822
- PAR ID:
- 10646611
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- Proceedings of the ACM on Human-Computer Interaction
- Volume:
- 9
- Issue:
- 2
- ISSN:
- 2573-0142
- Page Range / eLocation ID:
- 1 to 35
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Open source software is commonly portrayed as a meritocracy, where decisions are based solely on their technical merit. However, literature on open source suggests a complex social structure underlying the meritocracy. Social work environments such as GitHub make the relationships between users and between users and work artifacts transparent. This transparency enables developers to better use information such as technical value and social connections when making work decisions. We present a study on open source software contribution in GitHub that focuses on the task of evaluating pull requests, which are one of the primary methods for contributing code in GitHub. We analyzed the association of various technical and social measures with the likelihood of contribution acceptance. We found that project managers made use of information signaling both good technical contribution practices for a pull request and the strength of the social connection between the submitter and project manager when evaluating pull requests. Pull requests with many comments were much less likely to be accepted, moderated by the submitter's prior interaction in the project. Well-established projects were more conservative in accepting pull requests. These findings provide evidence that developers use both technical and social information when evaluating potential contributions to open source software projectsmore » « less
-
The recent addition of data journalists to several dozen U.S. public radio newsrooms has created multiple new hybridities in the form. No longer are numbers and large datasets “audio poison.” Instead, they are an essential tool for these journalists, who prize journalism’s interpretive function, expressing information in new ways and challenging conventions of broadcast newsroom employment. This study, which relies on semi-structured interviews with 13 public radio data journalists, uses Carlson’s boundary work typology to analyze the ways in which data journalists are expanding the boundaries of U.S. public radio journalism, as well as ways in which they have pushed back against expulsionary pressures. This study’s findings problematize the idea that the results of boundary work must be expressed as in-or-out proposition. Rather, U.S. public radio data journalists suggest their boundaries are a continuum where they may be conditionally accepted by their colleagues, depending on deadlines and on the skills possessed by non-data journalists.more » « less
-
Software bots automate tasks within Open Source Software (OSS) projects' pull requests and save reviewing time and effort ("the good"). However, their interactions can be disruptive and noisy and lead to information overload ("the bad"). To identify strategies to overcome such problems, we applied Design Fiction as a participatory method with 32 practitioners. We elicited 22 design strategies for a bot mediator or the pull request user interface ("the promising"). Participants envisioned a separate place in the pull request interface for bot interactions and a bot mediator that can summarize and customize other bots' actions to mitigate noise. We also collected participants' perceptions about a prototype implementing the envisioned strategies. Our design strategies can guide the development of future bots and social coding platforms.more » « less
-
Many journalists and newsrooms now incorporate audience contributions in their sourcing practices by leveraging user-generated content (UGC). However, their sourcing needs and practices as they seek information from UGCs are still not deeply understood by researchers or well-supported in tools. This paper first reports the results of a qualitative interview study with nine professional journalists about their UGC sourcing practices, detailing what journalists typically look for in UGCs and elaborating on two UGC sourcing approaches: deep reporting and wide reporting. These findings then inform a human-centered design approach to prototype a UGC sourcing tool for journalists, which enables journalists to interactively filter and rank UGCs based on users’ example content. We evaluate the prototype with nine professional journalists who source UGCs in their daily routines to understand how UGC sourcing practices are enabled and transformed, while also uncovering opportunities for future research and design to support journalistic sourcing practices and sensemaking processes.more » « less
An official website of the United States government
