skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Social media and volunteer rescue requests prediction with random forest and algorithm bias detection: a case of Hurricane Harvey
Abstract AI fairness is tasked with evaluating and mitigating bias in algorithms that may discriminate towards protected groups. This paper examines if bias exists in AI algorithms used in disaster management and in what manner. We consider the 2017 Hurricane Harvey when flood victims in Houston resorted to social media to request for rescue. We evaluate a Random Forest regression model trained to predict Twitter rescue request rates from social-environmental data using three fairness criteria (independence, separation, and sufficiency). The Social Vulnerability Index (SVI), its four sub-indices, and four variables representing digital divide were considered sensitive attributes. The Random Forest regression model extracted seven significant predictors of rescue request rates, and from high to low importance they were percent of renter occupied housing units, percent of roads in flood zone, percent of flood zone area, percent of wetland cover, percent of herbaceous, forested and shrub cover, mean elevation, and percent of households with no computer or device. Partial Dependence plots of rescue request rates against each of the seven predictors show the non-linear nature of their relationships. Results of the fairness evaluation of the Random Forest model using the three criteria show no obvious biases for the nine sensitive attributes, except that a minor imperfect sufficiency was found with the SVI Housing and Transportation sub-index. Future AI modeling in disaster research could apply the same methodology used in this paper to evaluate fairness and help reduce unfair resource allocation and other social and geographical disparities.  more » « less
Award ID(s):
1927513
PAR ID:
10425344
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
IOP Publishing
Date Published:
Journal Name:
Environmental Research Communications
Volume:
5
Issue:
6
ISSN:
2515-7620
Page Range / eLocation ID:
Article No. 065013
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Social media platforms are playing increasingly critical roles in disaster response and rescue operations. During emergencies, users can post rescue requests along with their addresses on social media, while volunteers can search for those messages and send help. However, efficiently leveraging social media in rescue operations remains challenging because of the lack of tools to identify rescue request messages on social media automatically and rapidly. Analyzing social media data, such as Twitter data, relies heavily on Natural Language Processing (NLP) algorithms to extract information from texts. The introduction of bidirectional transformers models, such as the Bidirectional Encoder Representations from Transformers (BERT) model, has significantly outperformed previous NLP models in numerous text analysis tasks, providing new opportunities to precisely understand and classify social media data for diverse applications. This study developed and compared ten VictimFinder models for identifying rescue request tweets, three based on milestone NLP algorithms and seven BERT-based. A total of 3191 manually labeled disaster-related tweets posted during 2017 Hurricane Harvey were used as the training and testing datasets. We evaluated the performance of each model by classification accuracy, computation cost, and model stability. Experiment results show that all BERT-based models have significantly increased the accuracy of categorizing rescue-related tweets. The best model for identifying rescue request tweets is a customized BERT-based model with a Convolutional Neural Network (CNN) classifier. Its F1-score is 0.919, which outperforms the baseline model by 10.6%. The developed models can promote social media use for rescue operations in future disaster events. 
    more » « less
  2. We analyze a public dataset of rescue requests for the Houston Metropolitan Area during Hurricane Harvey (2017) from the Red Cross. This dataset contains information including the location, gender, and emergency description in each requester’s report. We reveal the spatial distribution of the rescue requests and its relationship with indicators of the social, physical, and built environment. We show that the rescue request rates are significantly higher in regions with higher percentages of children, male population, population in poverty, or people with limited English, in addition to regions with higher inundation rate or worse traffic condition during Hurricane Harvey. The rescue request rate is found to be statistically uncorrelated with the percentage of flood hazard zone designated by the Federal Emergency Management Agency (FEMA). 
    more » « less
  3. Social vulnerability models are becoming increasingly important for hazard mitigation and recovery planning,but it remains unclear how well they explain disaster outcomes. Most studies using indicators and indexes employ them to either describe vulnerability patterns or compare newly devised measures to existing ones. The focus of this article is construct validation, in which we investigate the empirical validity of a range of models of social vulnerability using outcomes from Hurricane Sandy. Using spatial regression, relative measures of assistance applicants, affected renters, housing damage, and property loss were regressed on four social vulnerability models and their constituent pillars while controlling for flood exposure. The indexes best explained housing assistance applicants, whereas they poorly explained property loss. At the pillar level,themes related to access and functional needs, age, transportation, and housing were the most explanatory.Overall, social vulnerability models with weighted and profile configurations demonstrated higher construct validity than the prevailing social vulnerability indexes. The findings highlight the need to expand the number and breadth of empirical validation studies to better understand relationships among social vulnerability models and disaster outcomes. 
    more » « less
  4. Fairness Artificial Intelligence (AI) aims to identify and mitigate bias throughout the AI development process, spanning data collection, modeling, assessment, and deployment—a critical facet of establishing trustworthy AI systems. Tackling data bias through techniques like reweighting samples proves effective for promoting fairness. This paper undertakes a systematic exploration of reweighting samples for conventional Machine-Learning (ML) models, utilizing five models for binary classification on datasets such as Adult Income and COMPAS, incorporating various protected attributes. In particular, AI Fairness 360 (AIF360) from IBM, a versatile open-source library aimed at identifying and mitigating bias in machine-learning models throughout the entire AI application lifecycle, is employed as the foundation for conducting this systematic exploration. The evaluation of prediction outcomes employs five fairness metrics from AIF360, elucidating the nuanced and model-specific efficacy of reweighting samples in fostering fairness within traditional ML frameworks. Experimental results illustrate that reweighting samples effectively reduces bias in traditional ML methods for classification tasks. For instance, after reweighting samples, the balanced accuracy of Decision Tree (DT) improves to 100%, and its bias, as measured by fairness metrics such as Average Odds Difference (AOD), Equal Opportunity Difference (EOD), and Theil Index (TI), is mitigated to 0. However, reweighting samples does not effectively enhance the fairness performance of K Nearest Neighbor (KNN). This sheds light on the intricate dynamics of bias, underscoring the complexity involved in achieving fairness across different models and scenarios. 
    more » « less
  5. ABSTRACT Urban flooding is an increasing threat to cities and resident well‐being. The Federal Emergency Management Agency (FEMA) typically reports losses attributed to flooding which result from a stream overtopping its banks, discounting impacts of higher frequency, lower impact flooding that occurs when precipitation intensity exceeds the capacity of a drainage system. Despite its importance, the drivers of street flooding can often be difficult to identify, given street flooding data scarcity and the multitude of storm, built environment, and social factors involved. To address this knowledge gap, this study uses 922 street flooding reports to the city in Denver, Colorado, USA from 2000 to 2019 in coordination with rain gauge network data and Census tract information to improve understanding of spatiotemporal drivers of urban flooding. An initial threshold analysis using rainfall intensity to predict street flooding had performance close to random chance, which led us to investigate other drivers. A logistic regression describing the probability of a storm leading to a flood report showed the strongest predictors of urban flooding were, in descending order, maximum 5‐min rainfall intensity, population density, storm depth, storm duration, median tract income, and stormwater pipe density. The logistic regression also showed that rainfall intensity and population density are nearly as important in determining the likelihood of a flood report incidence. In addition, topographic wetness index values at locations of flooding reports were higher than randomly selected points. A linear regression predicting the number of reports per area identified percent impervious as the single most important predictor. Our methodologies can be used to better inform urban flood awareness, response, and mitigation and are applicable to any city with flood reports and spatial precipitation data. 
    more » « less