skip to main content


Title: VictimFinder: Harvesting rescue requests in disaster response from social media with BERT
Social media platforms are playing increasingly critical roles in disaster response and rescue operations. During emergencies, users can post rescue requests along with their addresses on social media, while volunteers can search for those messages and send help. However, efficiently leveraging social media in rescue operations remains challenging because of the lack of tools to identify rescue request messages on social media automatically and rapidly. Analyzing social media data, such as Twitter data, relies heavily on Natural Language Processing (NLP) algorithms to extract information from texts. The introduction of bidirectional transformers models, such as the Bidirectional Encoder Representations from Transformers (BERT) model, has significantly outperformed previous NLP models in numerous text analysis tasks, providing new opportunities to precisely understand and classify social media data for diverse applications. This study developed and compared ten VictimFinder models for identifying rescue request tweets, three based on milestone NLP algorithms and seven BERT-based. A total of 3191 manually labeled disaster-related tweets posted during 2017 Hurricane Harvey were used as the training and testing datasets. We evaluated the performance of each model by classification accuracy, computation cost, and model stability. Experiment results show that all BERT-based models have significantly increased the accuracy of categorizing rescue-related tweets. The best model for identifying rescue request tweets is a customized BERT-based model with a Convolutional Neural Network (CNN) classifier. Its F1-score is 0.919, which outperforms the baseline model by 10.6%. The developed models can promote social media use for rescue operations in future disaster events.  more » « less
Award ID(s):
1931301
PAR ID:
10338275
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Computers environment and urban systems
Volume:
95
ISSN:
0198-9715
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Timely and reliable sensing of infrastructure conditions is critical in disaster management for planning effective infrastructure restorations. Social media, a near real-time information source, has been widely used in disasters for forming timely situational awareness. Yet, using social media to sense electricity infrastructure conditions has not been explored. This study aims to address the research gap through mining public topics from social media. To achieve this purpose, we proposed a systematic and customized approach wherein (1) electricity-related social media data is extracted by the classifier developed based on Bidirectional Encoder Representations from Transformers (BERT); and (2) public topics are modeled with unigrams, bigrams, and trigrams to incorporate the formulaic expressions of infrastructure conditions in social media. Electricity infrastructures in Florida impacted by Hurricane Irma are studied for illustration and demonstration. Results show that the proposed approach is capable of sensing the temporal evolutions and geographic differences of electricity infrastructure conditions. 
    more » « less
  2. Effectively filtering and categorizing the large volume of user-generated content on social media during disaster events can help emergency management and disaster response prioritize their resources. Deep learning approaches, including recurrent neural networks and transformer-based models, have been previously used for this purpose. Capsule Neural Networks (CapsNets), initially proposed for image classification, have been proven to be useful for text analysis as well. However, to the best of our knowledge, CapsNets have not been used for classifying crisis-related messages, and have not been extensively compared with state-of-the-art transformer-based models, such as BERT. Therefore, in this study, we performed a thorough comparison between CapsNet models, state-of-the-art BERT models and two popular recurrent neural network models that have been successfully used for tweet classification, specifically, LSTM and Bi-LSTM models, on the task of classifying crisis tweets both in terms of their informativeness (binary classification), as well as their humanitarian content (multi-class classification). For this purpose, we used several benchmark datasets for crisis tweet classification, namely CrisisBench, CrisisNLP and CrisisLex. Experimental results show that the performance of the CapsNet models is on a par with that of LSTM and Bi-LSTM models for all metrics considered, while the performance obtained with BERT models have surpassed the performance of the other three models across different datasets and classes for both classification tasks, and thus BERT could be considered the best overall model for classifying crisis tweets. 
    more » « less
  3. Social media cyberbullying has a detrimental effect on human life. As online social networking grows daily, the amount of hate speech also increases. Such terrible content can cause depression and actions related to suicide. This paper proposes a trustable LSTM Autoencoder Network for cyberbullying detection on social media using synthetic data. We have demonstrated a cutting-edge method to address data availability difficulties by producing machine-translated data. However, several languages such as Hindi and Bangla still lack adequate investigations due to a lack of datasets. We carried out experimental identification of aggressive comments on Hindi, Bangla, and English datasets using the proposed model and traditional models, including Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BiLSTM), LSTM-Autoencoder, Word2vec, Bidirectional Encoder Representations from Transformers (BERT), and Generative Pre-trained Transformer 2 (GPT-2) models. We employed evaluation metrics such as f1-score, accuracy, precision, and recall to assess the models’ performance. Our proposed model outperformed all the models on all datasets, achieving the highest accuracy of 95%. Our model achieves state-of-the-art results among all the previous works on the dataset we used in this paper. 
    more » « less
  4. Many migrants are vulnerable due to noncitizenship, linguistic or cultural barriers, and inadequate safety-net infrastructures. Immigrant-oriented nonprofits can play an important role in improving immigrant well-being. However, progress on systematically evaluating the impact of nonprofits has been hampered by the difficulty in efficiently and accurately identifying immigrant-oriented nonprofits in large administrative data sets. We tackle this challenge by employing natural language processing (NLP) and machine learning (ML) techniques. Seven NLP algorithms are applied and trained in supervised ML models. The bidirectional encoder representations from transformers (BERT) technique offers the best performance, with an impressive accuracy of .89. Indeed, the model outperformed two nonmachine methods used in existing research, namely, identification of organizations via National Taxonomy of Exempt Entities codes or keyword searches of nonprofit names. We thus demonstrate the viability of computer-based identification of hard-to-identify nonprofits using organizational name data, a technique that may be applicable to other research requiring categorization based on short labels. We also highlight limitations and areas for improvement. 
    more » « less
  5. Hurricane Harvey in 2017 marked an important transition where many disaster victims used social media rather than the overloaded 911 system to seek rescue. This article presents a machine-learning-based detector of rescue requests from Harvey-related Twitter messages, which differentiates itself from existing ones by accounting for the potential impacts of ZIP codes on both the preparation of training samples and the performance of different machine learning models. We investigate how the outcomes of our ZIP code filtering differ from those of a recent, comparable study in terms of generating training data for machine learning models. Following this, experiments are conducted to test how the existence of ZIP codes would affect the performance of machine learning models by simulating different percentages of ZIP-code-tagged positive samples. The findings show that (1) all machine learning classifiers except K-nearest neighbors and Naïve Bayes achieve state-of-the-art performance in detecting rescue requests from social media; (2) using ZIP code filtering could increase the effectiveness of gathering rescue requests for training machine learning models; (3) machine learning models are better able to identify rescue requests that are associated with ZIP codes. We thereby encourage every rescue-seeking victim to include ZIP codes when posting messages on social media. This study is a useful addition to the literature and can be helpful for first responders to rescue disaster victims more efficiently. 
    more » « less