Automated journalism technology is transforming news production and changing how audiences perceive the news. As automated text-generation models advance, it is important to understand how readers perceive human-written and machine-generated content. This study used OpenAI’s GPT-2 text-generation model (May 2019 release) and articles from news organizations across the political spectrum to study participants’ reactions to human- and machine-generated articles. As participants read the articles, we collected their facial expression and galvanic skin response (GSR) data together with self-reported perceptions of article source and content credibility. We also asked participants to identify their political affinity and assess the articles’ political tone to gain insight into the relationship between political leaning and article perception. Our results indicate that the May 2019 release of OpenAI’s GPT-2 model generated articles that were misidentified as written by a human close to half the time, while human-written articles were identified correctly as written by a human about 70 percent of the time.
more »
« less
Understanding Reader Backtracking Behavior in Online News Articles
Rich engagement data can shed light on how people interact with online content and how such interactions may be determined by the content of the page. In this work, we investigate a specific type of interaction, backtracking, which refers to the action of scrolling back in a browser while reading an online news article. We leverage a dataset of close to 700K instances of more than 15K readers interacting with online news articles, in order to characterize and predict backtracking behavior. We first define different types of backtracking actions. We then show that “full” backtracks, where the readers eventually return to the spot at which they left the text, can be predicted by using features that were previously shown to relate to text readability. This finding highlights the relationship between backtracking and readability and suggests that backtracking could help assess readability of content at scale.
more »
« less
- Award ID(s):
- 1840751
- PAR ID:
- 10097546
- Date Published:
- Journal Name:
- The World Wide Web Conference 2019
- Page Range / eLocation ID:
- 3237 to 3243
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Research has explored using Automatic Text Simplification for reading assistance, with prior work identifying benefits and interests from Deaf and Hard-of-Hearing (DHH) adults. While the evaluation of these technologies remains a crucial aspect of research in the area, researchers lack guidance in terms of how to evaluate text complexity with DHH readers. Thus, in this work we conduct methodological research to evaluate metrics identified from prior work (including reading speed, comprehension questions, and subjective judgements of understandability and readability) in terms of their effectiveness for evaluating texts modified to be at various complexity levels with DHH adults at different literacy levels. Subjective metrics and low-linguistic-complexity comprehension questions distinguished certain text complexity levels with participants with lower literacy. Among participants with higher literacy, only subjective judgements of text readability distinguished certain text complexity levels. For all metrics, participants with higher literacy scored higher or provided more positive subjective judgements overall.more » « less
-
This study analyzes and compares how the digital semantic infrastructure of U.S. based digital news varies according to certain characteristics of the media outlet, including the community it serves, the content management system (CMS) it uses, and its institutional affiliation (or lack thereof). Through a multi-stage analysis of the actual markup found on news outlets’ online text articles, we reveal how multiple factors may be limiting the discoverability and reach of online media organizations focused on serving specific communities. Conceptually, we identify markup and metadata as aspects of the semantic infrastructure underpinning platforms’ mechanisms of distributing online news. Given the significant role that these platforms play in shaping the broader visibility of news content, we further contend that this markup therefore constitutes a kind of infrastructure of visibility by which news sources and voices are rendered accessible—or, conversely—invisible in the wider platform economy of journalism. We accomplish our analysis by first identifying key forms of digital markup whose structured data is designed to make online news articles more readily discoverable by search engines and social media platforms. We then analyze 2,226 digital news stories gathered from the main pages of 742 national, local, Black, and other identity-based news organizations in mid-2021, and analyze each for the presence of specific tags reflecting the Schema.org, OpenGraph, and Twitter metadata structures. We then evaluate the relationship between audience focus and the robustness of this digital semantic infrastructure. While we find only a weak relationship between the markup and the community served, additional analysis revealed a much stronger association between these metadata tags and content management system (CMS), in which 80% of the attributes appearing on an article were the same for a given CMS, regardless of publisher, market, or audience focus. Based on this finding, we identify the organizational characteristics that may influence the specific CMS used for digital publishing, and, therefore, the robustness of the digital semantic infrastructure deployed by the organization. Finally, we reflect on the potential implications of the highly disparate tag use we observe, particularly with respect to the broader visibility of online news designed to serve particular US communities.more » « less
-
How does presenting comments in a news article affect the ways that readers engage with and retain information about news? This paper presents results from a controlled experiment investigating effects related to different strategies for promoting discussion at news websites (N=336 participants). The strategies include highlighting specific comments about a data visualization, providing prompts with the comments, and annotating prompts on the visualization. By comparison to a simple list of comments (baseline), our analysis found that annotations contributed to higher levels of participant engagement in the discussion, yet lower levels of knowledge retention related to the article. These findings raise new considerations about whether and how to integrate discussion content into news and points toward future content moderation systems that assist in representing and eliciting discussion at news websites.more » « less
-
The news arguably serves to inform the quantitative reasoning (QR) of news audiences. Before one can contemplate how well the news serves this function, we first need to determine how much QR typical news stories require from readers. This paper assesses the amount of quantitative content present in a wide array of media sources, and the types of QR required for audiences to make sense of the information presented. We build a corpus of 230 US news reports across four topic areas (health, science, economy, and politics) in February 2020. After classifying reports for QR required at both the conceptual and phrase levels, we find that the news stories in our sample can largely be classified along a single dimension: The amount of quantitative information they contain. There were two main types of quantitative clauses: those reporting on magnitude and those reporting on comparisons. While economy and health reporting required significantly more QR than science or politics reporting, we could not reliably differentiate the topic area based on story-level requirements for quantitative knowledge and clause-level quantitative content. Instead, we find three reliable clusters of stories based on the amounts and types of quantitative information in the news stories.more » « less
An official website of the United States government

