skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Thursday, May 23 until 2:00 AM ET on Friday, May 24 due to maintenance. We apologize for the inconvenience.

Search for: All records

Creators/Authors contains: "Diakopoulos, Nicholas"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The scale of scientific publishing continues to grow, creating overload on science journalists who are inundated with choices for what would be most interesting, important, and newsworthy to cover in their reporting. Our work addresses this problem by considering the viability of creating a predictive model of newsworthiness of scientific articles that is trained using crowdsourced evaluations of newsworthiness. We proceed by first evaluating the potential of crowd-sourced evaluations of newsworthiness by assessing their alignment with expert ratings of newsworthiness, analyzing both quantitative correlations and qualitative rating rationale to understand limitations. We then demonstrate and evaluate a predictive model trained on these crowd ratings together with arXiv article metadata, text, and other computed features. Based on the crowdsourcing protocol we developed, we find that while crowdsourced ratings of newsworthiness often align moderately with expert ratings, there are also notable differences and divergences which limit the approach. Yet despite these limitations we also find that the predictive model we built provides a reasonably precise set of rankings when validated against expert evaluations (P@10 = 0.8, P@15 = 0.67), suggesting that a viable signal can be learned from crowdsourced evaluations of newsworthiness. Based on these findings we discuss opportunities for future work to leverage crowdsourcing and predictive approaches to support journalistic work in discovering and filtering newsworthy information. 
    more » « less
  2. We conducted a longitudinal study during the 2022 U.S. midterm elections, investigating the real-world impacts of uncertainty visualizations. Using our forecast model of the governor elections in 33 states, we created a website and deployed four uncertainty visualizations for the election forecasts: single quantile dotplot (1-Dotplot), dual quantile dotplots (2-Dotplot), dual histogram intervals (2-Interval), and Plinko quantile dotplot (Plinko), an animated design with a physical and probabilistic analogy. Our online experiment ran from Oct. 18, 2022, to Nov. 23, 2022, involving 1,327 participants from 15 states. We use Bayesian multilevel modeling and post-stratification to produce demographically-representative estimates of people's emotions, trust in forecasts, and political participation intention. We find that election forecast visualizations can heighten emotions, increase trust, and slightly affect people's intentions to participate in elections. 2-Interval shows the strongest effects across all measures; 1-Dotplot increases trust the most after elections. Both visualizations create emotional and trust gaps between different partisan identities, especially when a Republican candidate is predicted to win. Our qualitative analysis uncovers the complex political and social contexts of election forecast visualizations, showcasing that visualizations may provoke polarization. This intriguing interplay between visualization types, partisanship, and trust exemplifies the fundamental challenge of disentangling visualization from its context, underscoring a need for deeper investigation into the real-world impacts of visualizations. Our preprint and supplements are available at . 
    more » « less
  3. Government use of algorithmic decision-making (ADM) systems is widespread and diverse, and holding these increasingly high-impact, often opaque government algorithms accountable presents a number of challenges. Some European governments have launched registries of ADM systems used in public services, and some transparency initiatives exist for algorithms in specific areas of the United States government; however, the U.S. lacks an overarching registry that catalogs algorithms in use for public-service delivery throughout the government. This paper conducts an inductive thematic analysis of over 700 government ADM systems cataloged by the Algorithm Tips database in an effort to describe the various ways government algorithms might be understood and inform downstream uses of such an algorithmic catalog. We describe the challenge of government algorithm accountability, the Algorithm Tips database and method for conducting a thematic analysis, and the themes of topics and issues, levels of sophistication, interfaces, and utilities of U.S. government algorithms that emerge. Through these themes, we contribute several different descriptions of government algorithm use across the U.S. and at federal, state, and local levels which can inform stakeholders such as journalists, members of civil society, or government policymakers 
    more » « less
  4. null (Ed.)
    Headlines play an important role in both news audiences' attention decisions online and in news organizations’ efforts to attract that attention. A large body of research focuses on developing generally applicable heuristics for more effective headline writing. In this work, we measure the importance of a number of theoretically motivated textual features to headline performance. Using a corpus of hundreds of thousands of headline A/B tests run by hundreds of news publishers, we develop and evaluate a machine-learned model to predict headline testing outcomes. We find that the model exhibits modest performance above baseline and further estimate an empirical upper bound for such content-based prediction in this domain, indicating an important role for non-content-based factors in test outcomes. Together, these results suggest that any particular headline writing approach has only a marginal impact, and that understanding reader behavior and headline context are key to predicting news attention decisions. 
    more » « less
  5. null (Ed.)
    This article explores how Twitter’s algorithmic timeline influences exposure to different types of external media. We use an agent-based testing method to compare chronological timelines and algorithmic timelines for a group of Twitter agents that emulated real-world archetypal users. We first find that algorithmic timelines exposed agents to external links at roughly half the rate of chronological timelines. Despite the reduced exposure, the proportional makeup of external links remained fairly stable in terms of source categories (major news brands, local news, new media, etc.). Notably, however, algorithmic timelines slightly increased the proportion of “junk news” websites in the external link exposures. While our descriptive evidence does not fully exonerate Twitter’s algorithm, it does characterize the algorithm as playing a fairly minor, supporting role in shifting media exposure for end users, especially considering upstream factors that create the algorithm’s input—factors such as human behavior, platform incentives, and content moderation. We conclude by contextualizing the algorithm within a complex system consisting of many factors that deserve future research attention. 
    more » « less
  6. Many journalists and newsrooms now incorporate audience contributions in their sourcing practices by leveraging user-generated content (UGC). However, their sourcing needs and practices as they seek information from UGCs are still not deeply understood by researchers or well-supported in tools. This paper first reports the results of a qualitative interview study with nine professional journalists about their UGC sourcing practices, detailing what journalists typically look for in UGCs and elaborating on two UGC sourcing approaches: deep reporting and wide reporting. These findings then inform a human-centered design approach to prototype a UGC sourcing tool for journalists, which enables journalists to interactively filter and rank UGCs based on users’ example content. We evaluate the prototype with nine professional journalists who source UGCs in their daily routines to understand how UGC sourcing practices are enabled and transformed, while also uncovering opportunities for future research and design to support journalistic sourcing practices and sensemaking processes. 
    more » « less