skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: 3DLNews: A Three-decade Dataset of US Local News Articles
We present 3DLNews, a novel dataset with local news articles from the United States spanning the period from 1996 to 2024. It contains almost 1 million URLs (with HTML text) from over 14,000 local newspapers, TV, and radio stations across all 50 states, and provides a broad snapshot of the US local news landscape. The dataset was collected by scraping Google and Twitter search results. We employed a multi-step filtering process to remove non-news article links and enriched the dataset with metadata such as the names and geo-coordinates of the source news media organizations, article publication dates, etc. Furthermore, we demonstrated the utility of 3DLNews by outlining four applications.  more » « less
Award ID(s):
2245508
PAR ID:
10608024
Author(s) / Creator(s):
;
Publisher / Repository:
ACM
Date Published:
ISBN:
9798400704369
Page Range / eLocation ID:
5328 to 5332
Subject(s) / Keyword(s):
local news US news dataset news news media
Format(s):
Medium: X
Location:
Boise ID USA
Sponsoring Org:
National Science Foundation
More Like this
  1. This project explores how children and youth below the age of 18 sought to help others during the COVID-19 pandemic. We used the data included in this publication to answer research questions such as “How did children in the U.S. help others and themselves during the COVID-19 pandemic?” and “What issues were children in the U.S. concerned about during the COVID-19 pandemic?” This project includes a data dictionary and a dataset that summarizes a unique collection of 115 news articles focused on the helping behaviors and key concerns of children in the U.S. during the pandemic. The articles appeared in print or online news sources between 2020 and 2023. We searched for media coverage using terms such as “kids,” “help,” “volunteer,” “actions,” “pandemic,” and “COVID-19.” Over time we refined and added additional search terms based on emergent themes such as “raising money,” “making personal protective equipment,” and “helping with homework.” We limited our searches by language (English), geography (the United States), and time (an article had to be published between January 2020, when the virus was first detected in the U.S., and November 2023, when we ended our searches for the dataset). When we identified news coverage that fit our definition of helping behaviors, we saved a PDF of the article (all PDFs are available upon request from the PI). Information included in this dataset is summarized as follows: (1) article citation and link; (2) article synopsis; (3) information on the child or children featured in the article; (4) summary of key helping behaviors or other actions taken by children during the pandemic; (5) information on who children were trying to help or what type of change they were attempting to influence; (6) quotes from children or youth; and (7) notations of photos, videos, or links to additional resources. The envisioned audience for this data includes social science and public health researchers, journalists, and policy makers with an interest in children and the pandemic, specifically, or disasters and altruism, more broadly. 
    more » « less
  2. null (Ed.)
    Different news articles about the same topic often offer a variety of perspectives: an article written about gun violence might emphasize gun control, while another might promote 2nd Amendment rights, and yet a third might focus on mental health issues. In communication research, these different perspectives are known as “frames”, which, when used in news media will influence the opinion of their readers in multiple ways. In this paper, we present a method for effectively detecting frames in news headlines. Our training and performance evaluation is based on a new dataset of news headlines related to the issue of gun violence in the United States. This Gun Violence Frame Corpus (GVFC) was curated and annotated by journalism and communication experts. Our proposed approach sets a new state-of-the-art performance for multiclass news frame detection, significantly outperforming a recent baseline by 35.9% absolute difference in accuracy. We apply our frame detection approach in a large scale study of 88k news headlines about the coverage of gun violence in the U.S. between 2016 and 2018. 
    more » « less
  3. News media structure their reporting of events or issues using certain perspectives. When describing an incident involving gun violence, for example, some journalists may focus on mental health or gun regulation, while others may emphasize the discussion of gun rights. Such perspectives are called “frames” in communication research. We study, for the first time, the value of combining lead images and their contextual information with text to identify the frame of a given news article. We observe that using multiple modes of information(article- and image-derived features) improves prediction of news frames over any single mode of information when the images are relevant to the frames of the headlines. We also observe that frame image relevance is related to the ease of conveying frames via images, which we call frame concreteness. Additionally, we release the first multimodal news framing dataset related to gun violence in the U.S., curated and annotated by communication researchers. The dataset will allow researchers to further examine the use of multiple information modalities for studying media framing. 
    more » « less
  4. In response to the COVID-19 crisis, many local television (TV) newsrooms decided to have employees work from home (WFH) or from the field rather than from the newsroom, creating a kind of hybrid work characterized by a mix of work locations. From a review of research on telework and WFH, we identified possible impacts of WFH on work and on workers, with a particular focus on news work and news workers. Data on the impacts of hybrid work are drawn from interviews with local television news directors and journalists in the United States and observations of WFH. We found that through the creative application of technology, WFH news workers could successfully create a newscast, albeit with some concerns about story quality. However, WFH did not seem to satisfy workers’ needs for socialization or learning individually or as a group and created some problems coordinating work. Lifted restrictions on gatherings have mitigated some of the experienced problems, but we expect to see continued challenges to news workers’ informal learning in hybrid work settings. 
    more » « less
  5. We aim to develop methods for understanding how multimedia news exposure can affect people’s emotional responses, and we especially focus on news content related to gun violence, a very important yet polarizing issue in the U.S. We created the dataset NEmo+ by significantly extending the U.S. gun violence news-to-emotions dataset, BU-NEmo, from 320 to 1,297 news headline and lead image pairings and collecting 38,910 annotations in a large crowdsourcing experiment. In curating the NEmo+ dataset, we developed methods to identify news items that will trigger similar versus divergent emotional responses. For news items that trigger similar emotional responses, we compiled them into the NEmo+-Consensus dataset. We benchmark models on this dataset that predict a person’s dominant emotional response toward the target news item (single-label prediction). On the full NEmo+ dataset, containing news items that would lead to both differing and similar emotional responses, we also benchmark models for the novel task of predicting the distribution of evoked emotional responses in humans when presented with multi-modal news content. Our single-label and multi-label prediction models outperform baselines by large margins across several metrics. 
    more » « less