skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Where Do Stories Come From? Examining the Exploration Process in Investigative Data Journalism
Investigative data journalists work with a variety of data sources to tell a story. Though prior work has indicated that there is a close relationship between journalists' data work practices and that of data scientists. However, these relationships and data work practices are not empirically examined, and understanding them is crucial to inform the design of tools that are used by different groups of people including data scientists and data journalists. Thus, to bridge this gap, we studied investigative reporters' data work practices with one non-profit investigative newsroom. Our study design includes two activities: 1) semi-structured interviews with journalists, and 2) a sketching activity allowing journalists to depict examples of their work practices. By analyzing these data and synthesizing them across related prior work, we propose the major phases in the data-driven investigative journalism story idea generation process. Our study findings show that the journalists employ a collection of multiple, iterative, cyclic processes to identify journalistically "interesting'' story ideas. These processes both significantly resemble and show subtle nuanced differences with data science work practices identified in prior research. We further verified our proposal through a member check with key informants. This work offers three primary contributions. First, it provides a close glimpse into the main phases of investigative journalists' data-driven story idea generation technique. Second, it complements prior work studying formal data science practices by examining data-driven investigative journalists, whose primary expertise lies outside computing. Third, it identifies particular points in the data exploration processes that would benefit from design interventions and suggests future research directions.  more » « less
Award ID(s):
1844901
PAR ID:
10333951
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Proceedings of the ACM on Human-Computer Interaction
Volume:
5
Issue:
CSCW2
ISSN:
2573-0142
Page Range / eLocation ID:
1 to 31
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Public records requests are a central mechanism for government transparency. In practice, they are slow, complex processes that require analyzing large amounts of messy, unstructured data. In this paper, we introduce RequestAtlas, a system that helps investigative journalists review large quantities of unstructured data that result from submitting many public records requests. RequestAtlas was developed through a year-long participatory design collaboration with the California Reporting Project (CRP), a journalistic collective researching police use of force and police misconduct in California. RequestAtlas helps journalists evaluate the results of public records requests for completeness and negotiate with agencies for additional information. RequestAtlas has had significant real-world impact. It has been deployed for more than a year to identify missing data in response to public records requests and to facilitate negotiation with public records request officers. Through the process of designing and observing the use of RequestAtlas, we explore the technical challenges associated with the public records request process and the design needs of investigative journalists more generally. We argue that public records requests represent an instance of an adversarialtechnical relationshipin which two entities engage in a prolonged, iterative, often adversarial exchange of information. Technologists can support information-gathering efforts within these adversarial technical relationships by building flexible local solutions that help both entities account for the state of the ongoing information exchange. Additionally, we offer insights on ways to design applications that can assist investigative journalists in the inevitably significant data cleaning phase of processing large documents while supporting journalistic norms of verification and human review. Finally, we reflect on the ways that this participatory design process, despite its success, lays bare some of the limitations inherent in the public records request process and in the ''request and respond'' model of transparency more generally. 
    more » « less
  2. Gwizdka, Jacek; Rieh, Soo Young (Ed.)
    Keeping up with the research literature plays an important role in the workflow of scientists – allowing them to understand a field, formulate the problems they focus on, and develop the solutions that they contribute, which in turn shape the nature of the discipline. In this paper, we examine the literature review practices of data scientists. Data science represents a field seeing an exponential rise in papers, and increasingly drawing on and being applied in numerous diverse disciplines. Recent efforts have seen the development of several tools intended to help data scientists cope with a deluge of research and coordinated efforts to develop AI tools intended to uncover the research frontier. Despite these trends indicative of the information overload faced by data scientists, no prior work has examined the specific practices and challenges faced by these scientists in an interdisciplinary field with evolving scholarly norms. In this paper, we close this gap through a set of semi-structured interviews and think-aloud protocols of industry and academic data scientists (N = 20). Our results while corroborating other knowledge workers’ practices uncover several novel findings: individuals (1) are challenged in seeking and sensemaking of papers beyond their disciplinary bubbles, (2) struggle to understand papers in the face of missing details and mathematical content, (3) grapple with the deluge by leveraging the knowledge context in code, blogs, and talks, and (4) lean on their peers online and in-person. Furthermore, we outline future directions likely to help data scientists cope with the burgeoning research literature. 
    more » « less
  3. The evolving landscape of manipulated media, including the threat of deepfakes, has made information verification a daunting challenge for journalists. Technologists have developed tools to detect deepfakes, but these tools can sometimes yield inaccurate results, raising concerns about inadvertently disseminating manipulated content as authentic news. This study examines the impact of unreliable deepfake detection tools on information verification. We conducted role-playing exercises with 24 US journalists, immersing them in complex breaking-news scenarios where determining authenticity was challenging. Through these exercises, we explored questions regarding journalists’ investigative processes, use of a deepfake detection tool, and decisions on when and what to publish. Our findings reveal that journalists are diligent in verifying information, but sometimes rely too heavily on results from deepfake detection tools. We argue for more cautious release of such tools, accompanied by proper training for users to mitigate the risk of unintentionally propagating manipulated content as real news. 
    more » « less
  4. With a call for schools to infuse data across the curriculum, many are creating curricula and examining students’ thinking in data-intensive problems. As the discipline of statistics education broadens to data science education, there is a need to examine how practices in data science can inform work in K-12. We synthesize literature about statistics investigation processes, data science as a field and practices of data scientists. Further, we provide results from an ethnographic and interview study of the work of data scientists. Together, these inform a new framework to support data investigation processes. We explicate the practices and dispositions needed and offer a glimpse of how the framework can be used to move the discipline of data science education forward. 
    more » « less
  5. Science and technology journalists today face challenges in finding newsworthy leads due to increased workloads, reduced resources, and expanding scientific publishing ecosystems. Given this context, we explore computational methods to aid these journalists' news discovery in terms of their agency and time-efficiency. We prototyped three computational information subsidies into an interactive tool that we used as a probe to better understand how such a tool may offer utility or more broadly shape the practices of professional science journalists. Our findings highlight central considerations around science journalists' user agency, contexts of use, and professional responsibility that such tools can influence and could account for in design. Based on this, we suggest design opportunities for enhancing and extending user agency over the longer-term; incorporating contextual, personal and collaborative notions of newsworthiness; and leveraging flexible interfaces and generative models. Overall, our findings contribute a richer view of the sociotechnical system around computational news discovery tools, and suggest ways to improve such tools to better support the practices of science journalists. 
    more » « less