skip to main content


Title: Automatic News Article Generation from Legislative Proceedings: A Phenom-based Approach
Algorithmic journalism refers to automatic AI-constructed news stories. There have been successful commercial implementations for news stories in sports, weather, financial reporting and similar domains with highly structured, well defined tabular data sources. Other domains such as local reporting have not seen adoption of algorithmic journalism, and thus no automated reporting systems are available in these categories which can have important implications for the industry. In this paper, we demonstrate a novel approach for producing news stories on government legislative activity, an area that has not widely adopted algorithmic journalism. Our data source is state legislative proceedings, primarily the transcribed speeches and dialogue from floor sessions and committee hearings in US State legislatures. Specifically, we create a library of potential events called phenoms. We systematically analyze the transcripts for the presence of phenoms using a custom partial order planner. Each phenom, if present, contributes some natural language text to the generated article: either stating facts, quoting individuals or summarizing some aspect of the discussion. We evaluate two randomly chosen articles with a user study on Amazon Mechanical Turk with mostly Likert scale questions. Our results indicate a high degree of achievement for accuracy of facts and readability of final content with 13 of 22 users in the first article and 19 of 20 subjects of the second article agreeing or strongly agreeing that the articles included the most important facts of the hearings. Other results strengthen this finding in terms of accuracy, focus and writing quality.  more » « less
Award ID(s):
1924008
NSF-PAR ID:
10385702
Author(s) / Creator(s):
; ; ; ; ;
Editor(s):
Espinosa-Anke, Luis; Martín-Vide, Carlos; Spasić, Irena
Date Published:
Journal Name:
Statistical Language and Speech Processing
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This study analyzes and compares how the digital semantic infrastructure of U.S. based digital news varies according to certain characteristics of the media outlet, including the community it serves, the content management system (CMS) it uses, and its institutional affiliation (or lack thereof). Through a multi-stage analysis of the actual markup found on news outlets’ online text articles, we reveal how multiple factors may be limiting the discoverability and reach of online media organizations focused on serving specific communities. Conceptually, we identify markup and metadata as aspects of the semantic infrastructure underpinning platforms’ mechanisms of distributing online news. Given the significant role that these platforms play in shaping the broader visibility of news content, we further contend that this markup therefore constitutes a kind of infrastructure of visibility by which news sources and voices are rendered accessible—or, conversely—invisible in the wider platform economy of journalism. We accomplish our analysis by first identifying key forms of digital markup whose structured data is designed to make online news articles more readily discoverable by search engines and social media platforms. We then analyze 2,226 digital news stories gathered from the main pages of 742 national, local, Black, and other identity-based news organizations in mid-2021, and analyze each for the presence of specific tags reflecting the Schema.org, OpenGraph, and Twitter metadata structures. We then evaluate the relationship between audience focus and the robustness of this digital semantic infrastructure. While we find only a weak relationship between the markup and the community served, additional analysis revealed a much stronger association between these metadata tags and content management system (CMS), in which 80% of the attributes appearing on an article were the same for a given CMS, regardless of publisher, market, or audience focus. Based on this finding, we identify the organizational characteristics that may influence the specific CMS used for digital publishing, and, therefore, the robustness of the digital semantic infrastructure deployed by the organization. Finally, we reflect on the potential implications of the highly disparate tag use we observe, particularly with respect to the broader visibility of online news designed to serve particular US communities. 
    more » « less
  2. null (Ed.)
    In the artificial intelligence era, algorithmic journalists can produce news reports in natural language from structured data thanks to natural language generation (NLG) algorithms. This paper presents several algorithmic content generation models and discusses the impacts of algorithmic journalism on work within a framework consisting of three levels: replacing tasks of journalists, increasing efficiency, and developing new capabilities within journalism. The findings indicate that algorithmic journalism technology may lead some changes in journalism by enabling individual users to produce their own stories. This paper may contribute to an understanding of how algorithmic news is created and how algorithmic journalism technology impacts work. 
    more » « less
  3. Automated journalism technology is transforming news production and changing how audiences perceive the news. As automated text-generation models advance, it is important to understand how readers perceive human-written and machine-generated content. This study used OpenAI’s GPT-2 text-generation model (May 2019 release) and articles from news organizations across the political spectrum to study participants’ reactions to human- and machine-generated articles. As participants read the articles, we collected their facial expression and galvanic skin response (GSR) data together with self-reported perceptions of article source and content credibility. We also asked participants to identify their political affinity and assess the articles’ political tone to gain insight into the relationship between political leaning and article perception. Our results indicate that the May 2019 release of OpenAI’s GPT-2 model generated articles that were misidentified as written by a human close to half the time, while human-written articles were identified correctly as written by a human about 70 percent of the time. 
    more » « less
  4. null (Ed.)
    Different news articles about the same topic often offer a variety of perspectives: an article written about gun violence might emphasize gun control, while another might promote 2nd Amendment rights, and yet a third might focus on mental health issues. In communication research, these different perspectives are known as “frames”, which, when used in news media will influence the opinion of their readers in multiple ways. In this paper, we present a method for effectively detecting frames in news headlines. Our training and performance evaluation is based on a new dataset of news headlines related to the issue of gun violence in the United States. This Gun Violence Frame Corpus (GVFC) was curated and annotated by journalism and communication experts. Our proposed approach sets a new state-of-the-art performance for multiclass news frame detection, significantly outperforming a recent baseline by 35.9% absolute difference in accuracy. We apply our frame detection approach in a large scale study of 88k news headlines about the coverage of gun violence in the U.S. between 2016 and 2018. 
    more » « less
  5. Nicewonger, Todd E. ; McNair, Lisa D. ; Fritz, Stacey (Ed.)
    https://pressbooks.lib.vt.edu/alaskanative/ At the start of the pandemic, the editors of this annotated bibliography initiated a remote (i.e., largely virtual) ethnographic research project that investigated how COVID-19 was impacting off-site modular construction practices in Alaska Native communities. Many of these communities are located off the road system and thus face not only dramatically higher costs but multiple logistical challenges in securing licensed tradesmen and construction crews and in shipping building supplies and equipment to their communities. These barriers, as well as the region’s long winters and short building seasons, complicate the construction of homes and related infrastructure projects. Historically, these communities have also grappled with inadequate housing, including severe overcrowding and poor-quality building stock that is rarely designed for northern Alaska’s climate (Marino 2015). Moreover, state and federal bureaucracies and their associated funding opportunities often further complicate home building by failing to accommodate the digital divide in rural Alaska and the cultural values and practices of Native communities.[1] It is not surprising, then, that as we were conducting fieldwork for this project, we began hearing stories about these issues and about how the restrictions caused by the pandemic were further exacerbating them. Amidst these stories, we learned about how modular home construction was being imagined as a possible means for addressing both the complications caused by the pandemic and the need for housing in the region (McKinstry 2021). As a result, we began to investigate how modular construction practices were figuring into emergent responses to housing needs in Alaska communities. We soon realized that we needed to broaden our focus to capture a variety of prefabricated building methods that are often colloquially or idiomatically referred to as “modular.” This included a range of prefabricated building systems (e.g., manufactured, volumetric modular, system-built, and Quonset huts and other reused military buildings[2]). Our further questions about prefabricated housing in the region became the basis for this annotated bibliography. Thus, while this bibliography is one of multiple methods used to investigate these issues, it played a significant role in guiding our research and helped us bring together the diverse perspectives we were hearing from our interviews with building experts in the region and the wider debates that were circulating in the media and, to a lesser degree, in academia. The actual research for each of three sections was carried out by graduate students Lauren Criss-Carboy and Laura Supple.[3] They worked with us to identify source materials and their hard work led to the team identifying three themes that cover intersecting topics related to housing security in Alaska during the pandemic. The source materials collected in these sections can be used in a variety of ways depending on what readers are interested in exploring, including insights into debates on housing security in the region as the pandemic was unfolding (2021-2022). The bibliography can also be used as a tool for thinking about the relational aspects of these themes or the diversity of ways in which information on housing was circulating during the pandemic (and the implications that may have had on community well-being and preparedness). That said, this bibliography is not a comprehensive analysis. Instead, by bringing these three sections together with one another to provide a snapshot of what was happening at that time, it provides a critical jumping off point for scholars working on these issues. The first section focuses on how modular housing figured into pandemic responses to housing needs. In exploring this issue, author Laura Supple attends to both state and national perspectives as part of a broader effort to situate Alaska issues with modular housing in relation to wider national trends. This led to the identification of multiple kinds of literature, ranging from published articles to publicly circulated memos, blog posts, and presentations. These materials are important source materials that will likely fade in the vastness of the Internet and thus may help provide researchers with specific insights into how off-site modular construction was used – and perhaps hyped – to address pandemic concerns over housing, which in turn may raise wider questions about how networks, institutions, and historical experiences with modular construction are organized and positioned to respond to major societal disruptions like the pandemic. As Supple pointed out, most of the material identified in this review speaks to national issues and only a scattering of examples was identified that reflect on the Alaskan context. The second section gathers a diverse set of communications exploring housing security and homelessness in the region. The lack of adequate, healthy housing in remote Alaska communities, often referred to as Alaska’s housing crisis, is well-documented and preceded the pandemic (Guy 2020). As the pandemic unfolded, journalists and other writers reported on the immense stress that was placed on already taxed housing resources in these communities (Smith 2020; Lerner 2021). The resulting picture led the editors to describe in their work how housing security in the region exists along a spectrum that includes poor quality housing as well as various forms of houselessness including, particularly relevant for the context, “hidden homelessness” (Hope 2020; Rogers 2020). The term houseless is a revised notion of homelessness because it captures a richer array of both permanent and temporary forms of housing precarity that people may experience in a region (Christensen et al. 2107). By identifying sources that reflect on the multiple forms of housing insecurity that people were facing, this section highlights the forms of disparity that complicated pandemic responses. Moreover, this section underscores ingenuity (Graham 2019; Smith 2020; Jason and Fashant 2021) that people on the ground used to address the needs of their communities. The third section provides a snapshot from the first year of the pandemic into how CARES Act funds were allocated to Native Alaska communities and used to address housing security. This subject was extremely complicated in Alaska due to the existence of for-profit Alaska Native Corporations and disputes over eligibility for the funds impacted disbursements nationwide. The resources in this section cover that dispute, impacts of the pandemic on housing security, and efforts to use the funds for housing as well as barriers Alaska communities faced trying to secure and use the funds. In summary, this annotated bibliography provides an overview of what was happening, in real time, during the pandemic around a specific topic: housing security in largely remote Alaska Native communities. The media used by housing specialists to communicate the issues discussed here are diverse, ranging from news reports to podcasts and from blogs to journal articles. This diversity speaks to the multiple ways in which information was circulating on housing at a time when the nightly news and radio broadcasts focused heavily on national and state health updates and policy developments. Finding these materials took time, and we share them here because they illustrate why attention to housing security issues is critical for addressing crises like the pandemic. For instance, one theme that emerged out of a recent National Science Foundation workshop on COVID research in the North NSF Conference[4] was that Indigenous communities are not only recovering from the pandemic but also evaluating lessons learned to better prepare for the next one, and resilience will depend significantly on more—and more adaptable—infrastructure and greater housing security. 
    more » « less