skip to main content


The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, July 12 until 9:00 AM ET on Saturday, July 13 due to maintenance. We apologize for the inconvenience.

Title: Multimodal Semi-supervised Learning for Disaster Tweet Classification
During natural disasters, people often use social media platforms, such as Twitter, to post information about casualties and damage produced by disasters. This information can help relief authorities gain situational awareness in nearly real time, and enable them to quickly distribute resources where most needed. However, annotating data for this purpose can be burdensome, subjective and expensive. In this paper, we investigate how to leverage the copious amounts of unlabeled data generated on social media by disaster eyewitnesses and affected individuals during disaster events. To this end, we propose a semi-supervised learning approach to improve the performance of neural models on several multimodal disaster tweet classification tasks. Our approach shows significant improvements, obtaining up to 7.7% improvements in F-1 in low-data regimes and 1.9% when using the entire training data. We make our code and data publicly available at  more » « less
Award ID(s):
Author(s) / Creator(s):
Date Published:
Journal Name:
The 29th International Conference on Computational Linguistics (COLING 2022)
Page Range / eLocation ID:
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Radianti, Jaziar ; Dokas, Ioannis ; Lalone, Nicolas ; Khazanchi, Deepak (Ed.)
    The shared real-time information about natural disasters on social media platforms like Twitter and Facebook plays a critical role in informing volunteers, emergency managers, and response organizations. However, supervised learning models for monitoring disaster events require large amounts of annotated data, making them unrealistic for real-time use in disaster events. To address this challenge, we present a fine-grained disaster tweet classification model under the semi-supervised, few-shot learning setting where only a small number of annotated data is required. Our model, CrisisMatch, effectively classifies tweets into fine-grained classes of interest using few labeled data and large amounts of unlabeled data, mimicking the early stage of a disaster. Through integrating effective semi-supervised learning ideas and incorporating TextMixUp, CrisisMatch achieves performance improvement on two disaster datasets of 11.2% on average. Further analyses are also provided for the influence of the number of labeled data and out-of-domain results. 
    more » « less
  2. This article seeks to go beyond traditional GIS methods used in creating maps for disaster response that commonly look at the disaster extent. Instead, a slightly different approach is taken using social media data collected from Twitter to explore how people communicate during disaster events, how online communities form and evolve, and how communication methods can improve. This study collected the Twitter data during the 2015 Nepal earthquake disaster and applied a spatiotemporal analysis to find any patterns that show shadows or gaps in communication channels in local communities’ communication. Linkages in social media can be used to understand how people communicate, how quickly they diffuse information, and how social networks form online during disasters. These can improve communication throughout disaster phases. This study offers a deeper understanding of the kinds of spatiotemporal patterns and spatial social networks that can be observed during disaster events. The need for better communication during disaster events is imperative for better disaster management, increasing community resilience, and saving lives. 
    more » « less
  3. Global social media use during natural disasters has been well documented (Murthy et al., 2017). In the U.S., public social media platforms are often a primary venue for those affected by disasters . Some disaster victims believe first responders will see their public posts and that the 9-1-1 telephone system becomes overloaded during crises. Moreover, some feel that the accuracy and utility of information on social media is likely higher than traditional media sources . However, sifting through content during a disaster is often difficult due to the high volume of ‘non-relevant’ content. In addition, text is studied more than images posted on Twitter, leaving a potential gap in understanding disaster experiences. Images posted on social media during disasters have a high level of complexity (Murthy et al., 2016). Our study responds to O’Neal et al.’s (2017) call-to-action that social media images posted during disasters should be studied using machine learning. 
    more » « less
  4. null (Ed.)
    During disasters, it is critical to deliver emergency information to appropriate first responders. Name-based information delivery provides efficient, timely dissemination of relevant content to first responder teams assigned to different incident response roles. People increasingly depend on social media for communicating vital information, using free-form text. Thus, a method that delivers these social media posts to the right first responders can significantly improve outcomes. In this paper, we propose FLARE, a framework using 'Social Media Engines' (SMEs) to map social media posts (SMPs), such as tweets, to the right names. SMEs perform natural language processing-based classification and exploit several machine learning capabilities, in an online real-time manner. To reduce the manual labeling effort required for learning during the disaster, we leverage active learning, complemented by dispatchers with specific domain-knowledge performing limited labeling. We also leverage federated learning across various public-safety departments with specialized knowledge to handle notifications related to their roles in a cooperative manner. We implement three different classifiers: for incident relevance, organization, and fine-grained role prediction. Each class is associated with a specific subset of the namespace graph. The novelty of our system is the integration of the namespace with federated active learning and inference procedures to identify and deliver vital SMPs to the right first responders in a distributed multi-organization environment, in real-time. Our experiments using real-world data, including tweets generated by citizens during the wildfires in California in 2018, show our approach outperforming both a simple keyword-based classification and several existing NLP-based classification techniques. 
    more » « less
  5. The increasing popularity of multimedia messages shared through public or private social media spills into diverse information dissemination contexts. To date, public social media has been explored as a potential alert system during natural disasters, but high levels of noise (i.e., non-relevant content) present challenges in both understanding social experiences of a disaster and in facilitating disaster recovery. This study builds on current research by uniquely using social media data, collected in the field through qualitative interviews, to create a supervised machine learning model. Collected data represents rescuers and rescuees during the 2017 Hurricane Harvey. Preliminary findings indicate a 99% accuracy in classifying data between signal and noise for signal-to-noise ratios (SNR) of 1:1, 1:2, 1:4, and 1:8. We also find 99% accuracy in classification between respondent types (volunteer rescuer, official rescuer, and rescuee). We furthermore compare human and machine coded attributes, finding that Google Vision API is a more reliable source of detecting attributes for the training set. 
    more » « less