skip to main content


Title: SCStory: Self-supervised and Continual Online Story Discovery
We present a framework SCStory for online story discovery, that helps people digest rapidly published news article streams in realtime without human annotations. To organize news article streams into stories, existing approaches directly encode the articles and cluster them based on representation similarity. However, these methods yield noisy and inaccurate story discovery results because the generic article embeddings do not effectively reflect the storyindicative semantics in an article and cannot adapt to the rapidly evolving news article streams. SCStory employs self-supervised and continual learning with a novel idea of story-indicative adaptive modeling of news article streams. With a lightweight hierarchical embedding module that first learns sentence representations and then article representations, SCStory identifies story-relevant information of news articles and uses them to discover stories. The embedding module is continuously updated to adapt to evolving news streams with a contrastive learning objective, backed up by two unique techniques, confidence-aware memory replay and prioritized-augmentation, employed for label absence and data scarcity problems. Thorough experiments on real and the latest news data sets demonstrate that SCStory outperforms existing state-of-the-art algorithms for unsupervised online story discovery.  more » « less
Award ID(s):
1956151 1741317 1704532
PAR ID:
10467094
Author(s) / Creator(s):
; ; ;
Corporate Creator(s):
Editor(s):
Proc. 2023 The Web Conf. 
Publisher / Repository:
ACM
Date Published:
Edition / Version:
1
ISBN:
9781450394161
Page Range / eLocation ID:
1853 to 1864
Subject(s) / Keyword(s):
text mining, Self-supervised and Continual Online Story Discovery, stream mining
Format(s):
Medium: X
Location:
Austin TX USA
Sponsoring Org:
National Science Foundation
More Like this
  1. Proc. 2023 ACM SIGIR Int. Conf. on Research and Development in Information Retrieval (Ed.)
    Unsupervised discovery of stories with correlated news articles in real-time helps people digest massive news streams without expensive human annotations. A common approach of the existing studies for unsupervised online story discovery is to represent news articles with symbolic- or graph-based embedding and incrementally cluster them into stories. Recent large language models are expected to improve the embedding further, but a straightforward adoption of the models by indiscriminately encoding all information in articles is ineffective to deal with text-rich and evolving news streams. In this work, we propose a novel thematic embedding with an off-the-shelf pretrained sentence encoder to dynamically represent articles and stories by considering their shared temporal themes. To realize the idea for unsupervised online story discovery, a scalable framework USTORY is introduced with two main techniques, theme- and time-aware dynamic embedding and novelty aware adaptive clustering, fueled by lightweight story summaries. A thorough evaluation with real news data sets demonstrates that USTORY achieves higher story discovery performances than baselines while being robust and scalable to various streaming settings. 
    more » « less
  2. Sparked by a collaboration between academic researchers and science media professionals, this study sought to test three commonly used headline formats that vary based on whether (and, if so, how) important information is left out of a headline to encourage participants to read the corresponding article; these formats are traditionally-formatted headlines, forward-referencing headlines, and question-based headlines. Although headline format did not influence story selection or engagement, it did influence participants evaluations of both the headline’s and the story’s credibility (question-based headlines were viewed as the least credible). Moreover, individuals’ science curiosity and political views predicted their engagement with environmental stories as well as their views about the credibility of the headline and story. Thus, headline formats appear to play a significant role in audience’s perceptions of online news stories, and science news professionals ought to consider the effects different formats have on readers. 
    more » « less
  3. Choosing the political party nominees, who will appear on the ballot for the US presidency, is a long process that starts two years before the general election. The news media plays a particular role in this process by continuously covering the state of the race. How can this news coverage be characterized? Given that there are thousands of news organizations, but each of us is exposed to only a few of them, we might be missing most of it. Online news aggregators, which aggregate news stories from a multitude of news sources and perspectives, could provide an important lens for the analysis. One such aggregator is Google’s Top stories, a recent addition to Google’s search result page. For the duration of 2019, we have collected the news headlines that Google Top stories has displayed for 30 candidates of both US political parties. Our dataset contains 79,903 news story URLs published by 2,168 unique news sources. Our analysis indicates that despite this large number of news sources, there is a very skewed distribution of where the Top stories are originating, with a very small number of sources contributing the majority of stories. We are sharing our dataset1 so that other researchers can answer questions related to algorithmic curation of news as well as media agenda setting in the context of political elections. 
    more » « less
  4. Every day people share personal stories online, reaching millions of users around the world through blogs, social media and news websites. Why are some of these stories more attractive to readers than others? What features of these personal narratives make readers empathize with the storyteller? Do the readers’ personal characteristics and experiences play a role in feeling connection to the story they read? Experimental studies in psychology show that there are several factors that increase empathy in the aggregate, but there is a need for deeper understanding of empathetic feelings at the individual level of storyteller, story, and reader. Here, we present the design and analysis of a survey that studied the impact of story features and reader predispositions and perceptions on the empathy they feel when reading online stories. We use causal trees to find the individual-level causal factors for empathy and to understand the heterogeneity in the treatment effects. One of our main findings is that empathy is contextual and, while reader personality plays a significant role in evoking empathy, the mood of the reader prior to reading the story and linguistic story features have an impact as well. The results of our analyses can be used to help people create content that others care about and to help them communicate more effectively 
    more » « less
  5. Abstract

    What if we used the stories that researchers and practitioners tell each other as tools to advance interdisciplinary disaster research? This article hypothesizes that doing so could foster a new mode of collaborative learning and discovery. People, including researchers, regularly tell stories to relate “what happened” based on their experience, often in ways that augment or contradict existing understandings. These stories provide naturalistic descriptions of context, complexity, and dynamic relationships in ways that formal theories, static data, and interpretations of findings can miss. They often do so memorably and engagingly, which makes them beneficial to researchers across disciplines and allows them to be integrated into their own work. Seeking out, actively inviting, sharing, and discussing these stories in interdisciplinary teams that have developed a strong sense of trust can therefore provide partial escape from discipline‐specific reasoning and frameworks that are so often unconsciously employed. To develop and test this possibility, this article argues that the diverse and rapidly growing hazards and disaster field needs to incorporate a basic theoretical understanding of stories, building from folkloristics and other sources. It would also need strategies to draw out and build from stories in suitable interdisciplinary research forums and, in turn, to find ways to incorporate the discussions that emanate from stories into ongoing analyses, interpretations, and future lines of interdisciplinary inquiry.

     
    more » « less