skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, February 13 until 2:00 AM ET on Friday, February 14 due to maintenance. We apologize for the inconvenience.


Title: Tracking Disaster Footprints with Social Streaming Data
Social media has become an indispensable tool in the face of natural disasters due to its broad appeal and ability to quickly disseminate information. For instance, Twitter is an important source for disaster responders to search for (1) topics that have been identified as being of particular interest over time, i.e., common topics such as “disaster rescue”; (2) new emerging themes of disaster-related discussions that are fast gathering in social media streams (Saha and Sindhwani 2012), i.e., distinct topics such as “the latest tsunami destruction”. To understand the status quo and allocate limited resources to most urgent areas, emergency managers need to quickly sift through relevant topics generated over time and investigate their commonness and distinctiveness. A major obstacle to the effective usage of social media, however, is its massive amount of noisy and undesired data. Hence, a naive method, such as set intersection/difference to find common/distinct topics, is often not practical. To address this challenge, this paper studies a new topic tracking problem that seeks to effectively identify the common and distinct topics with social streaming data. The problem is important as it presents a promising new way to efficiently search for accurate information during emergency response. This is achieved by an online Nonnegative Matrix Factorization (NMF) scheme that conducts a faster update of latent factors, and a joint NMF technique that seeks the balance between the reconstruction error of topic identification and the losses induced by discovering common and distinct topics. Extensive experimental results on real-world datasets collected during Hurricane Harvey and Florence reveal the effectiveness of our framework.  more » « less
Award ID(s):
2026860 1633381
PAR ID:
10213596
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the AAAI Conference on Artificial Intelligence
Volume:
34
Issue:
01
ISSN:
2159-5399
Page Range / eLocation ID:
370 to 377
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. This article seeks to go beyond traditional GIS methods used in creating maps for disaster response that commonly look at the disaster extent. Instead, a slightly different approach is taken using social media data collected from Twitter to explore how people communicate during disaster events, how online communities form and evolve, and how communication methods can improve. This study collected the Twitter data during the 2015 Nepal earthquake disaster and applied a spatiotemporal analysis to find any patterns that show shadows or gaps in communication channels in local communities’ communication. Linkages in social media can be used to understand how people communicate, how quickly they diffuse information, and how social networks form online during disasters. These can improve communication throughout disaster phases. This study offers a deeper understanding of the kinds of spatiotemporal patterns and spatial social networks that can be observed during disaster events. The need for better communication during disaster events is imperative for better disaster management, increasing community resilience, and saving lives. 
    more » « less
  2. Researchers using social media data want to understand the discussions occurring in and about their respective fields. These domain experts often turn to topic models to help them see the entire landscape of the conversation, but unsupervised topic models often produce topic sets that miss topics experts expect or want to see. To solve this problem, we propose Guided Topic-Noise Model (GTM), a semi-supervised topic model designed with large domain-specific social media data sets in mind. The input to GTM is a set of topics that are of interest to the user and a small number of words or phrases that belong to those topics. These seed topics are used to guide the topic generation process, and can be augmented interactively, expanding the seed word list as the model provides new relevant words for different topics. GTM uses a novel initialization and a new sampling algorithm called Generalized Polya Urn (GPU) seed word sampling to produce a topic set that includes expanded seed topics, as well as new unsupervised topics. We demonstrate the robustness of GTM on open-ended responses from a public opinion survey and four domain-specific Twitter data sets. 
    more » « less
  3. The objective of this paper is to propose and test a system analytics framework based on social sensing and text mining to detect topic evolution associated with the performance of infrastructure systems in disasters. Social media, like Twitter, as active channels of communication and information dissemination, provide insights into real-time information and first-hand experience from affected areas in mass emergencies. While the existing studies show the importance of social sensing in improving situational awareness and emergency response in disasters, the use of social sensing for detection and analysis of infrastructure systems and their resilience performance has been rather limited. This limitation is due to the lack of frameworks to model the events and topics (e.g., grid interruption and road closure) evolution associated with infrastructure systems (e.g., power, highway, airport, and oil) in times of disasters. The proposed framework detects infrastructure-related topics of the tweets posted in disasters and their evolutions by integrating searching relevant keywords, text lemmatization, Part-of-Speech (POS) tagging, TF-IDF vectorization, topic modeling by using Latent Dirichlet Allocation (LDA), and K-Means clustering. The application of the proposed framework was demonstrated in a study of infrastructure systems in Houston during Hurricane Harvey. In this case study, more than sixty thousand tweets were retrieved from 150-mile radius in Houston over 39 days. The analysis of topic detection and evolution from user-generated data were conducted, and the clusters of tweets pertaining to certain topics were mapped in networks over time. The results show that the proposed framework enables to summarize topics and track the movement of situations in different disaster phases. The analytics elements of the proposed framework can improve the recognition of infrastructure performance through text-based representation and provide evidence for decision-makers to take actionable measurements. 
    more » « less
  4. Nonnegative matrix factorization (NMF) has been increasingly popular for topic modeling of large-scale documents. However, the resulting topics often represent only general, thus redundant information about the data rather than minor, but potentially meaningful information to users. To tackle this problem, we propose a novel ensemble model of nonnegative matrix factorization for discovering high-quality local topics. Our method leverages the idea of an ensemble model to successively perform NMF given a residual matrix obtained from previous stages and generates a sequence of topic sets. The novelty of our method lies in the fact that it utilizes the residual matrix inspired by a state-of-the-art gradient boosting model and applies a sophisticated local weighting scheme on the given matrix to enhance the locality of topics, which in turn delivers high-quality, focused topics of interest to users.

     
    more » « less
  5. Abstract

    Understanding the societal impacts caused by community disruptions (e.g., power outages and road closures), particularly during the response stage, with timeliness and sufficient detail is an underexplored, yet important, consideration. It is critical for effective decision‐making and coordination in disaster response and relief activities as well as post‐disaster virtual reconnaissance activities. This study proposes a semiautomated social media analytics approach for social sensing of Disaster Impacts and Societal Considerations (SocialDISC). This approach addresses two limitations of existing social media analytics approaches: lacking adaptability to the need of different analyzers or different disasters and missing the information related to subjective feelings, emotions, and opinions of the people. SocialDISC labels and clusters social media posts in each disruption category to facilitate scanning by analyzers. Analyzers, in this paper, are persons who acquire social impact information from social media data (e.g., infrastructure management personnel, volunteers, researchers from academia, and some residents impacted by the disaster). Furthermore, SocialDISC enables analyzers to quickly parse topics and emotion signals of each subevent to assess the societal impacts caused by disruption events. To demonstrate the performance of SocialDISC, the authors proposed a case study based on Hurricane Harvey, one of the costliest disasters in U.S. history, and analyzed the disruptions and corresponding societal impacts in different aspects. The analysis result shows that Houstonians suffered greatly from flooded houses, lack of access to food and water, and power outages. SocialDISC can foster an understanding of the relationship between disruptions of infrastructures and societal impacts, expectations of the public when facing disasters, and infrastructure interdependency and cascading failures. SocialDISC's provision of timely information about the societal impacts of people may help disaster response decision‐making.

     
    more » « less