skip to main content


This content will become publicly available on September 1, 2024

Title: Examining data imbalance in crowdsourced reports for improving flash flood situational awareness
The use of crowdsourced data has been finding practical use for enhancing situational awareness during disasters. While recent studies have shown promising results regarding the potential of crowdsourced data (such as user-generated flood reports) for flash flood mapping and situational awareness, little attention has been paid to data imbalance issues that could introduce biases in data and assessment. To address this gap, in this study, we examine biases present in crowdsourced reports to identify data imbalance with a goal of improving disaster situational awareness. Three biases are examined: sample bias, spatial bias, and demographic bias. To examine these biases, we analyzed reported flooding from 3-1-1 reports (which is a citizen hotline allowing the community to report problems such as flooding) and Waze reports (which is a GPS navigation app that allows drivers to report flooded roads) with respect to FEMA damage data collected in the aftermaths of Tropical Storm Imelda in Harris County, Texas, in 2019 and Hurricane Ida in New York City in 2021. First, sample bias is assessed by expanding the flood-related categories in 3-1-1 reports. Integrating other flooding related topics into the Global Moran's I and Local Indicator of Spatial Association (LISA) revealed more communities that were impacted by floods. To examine spatial bias, we perform the LISA and BI-LISA tests on the data sets—FEMA damage, 3-1-1 reports, and Waze reports—at the census tract level and census block group level. By looking at two geographical aggregations, we found that the larger spatial aggregations, census tracts, show less data imbalance in the results. Through a regression analysis, we found that 3-1-1 reports and Waze reports have data imbalance limitations in areas where minority populations and single parent households reside. The findings of this study advance understanding of data imbalance and biases in crowdsourced datasets that are growingly used for disaster situational awareness. Through addressing data imbalance issues, researchers and practitioners can proactively mitigate biases in crowdsourced data and prevent biased and inequitable decisions and actions.  more » « less
Award ID(s):
1832662
NSF-PAR ID:
10481377
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Elsevier
Date Published:
Journal Name:
International Journal of Disaster Risk Reduction
Volume:
95
Issue:
C
ISSN:
2212-4209
Page Range / eLocation ID:
103825
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Urban flooding is a major natural disaster that poses a serious threat to the urban environment. It is highly demanded that the flood extent can be mapped in near real-time for disaster rescue and relief missions, reconstruction efforts, and financial loss evaluation. Many efforts have been taken to identify the flooding zones with remote sensing data and image processing techniques. Unfortunately, the near real-time production of accurate flood maps over impacted urban areas has not been well investigated due to three major issues. (1) Satellite imagery with high spatial resolution over urban areas usually has nonhomogeneous background due to different types of objects such as buildings, moving vehicles, and road networks. As such, classical machine learning approaches hardly can model the spatial relationship between sample pixels in the flooding area. (2) Handcrafted features associated with the data are usually required as input for conventional flood mapping models, which may not be able to fully utilize the underlying patterns of a large number of available data. (3) High-resolution optical imagery often has varied pixel digital numbers (DNs) for the same ground objects as a result of highly inconsistent illumination conditions during a flood. Accordingly, traditional methods of flood mapping have major limitations in generalization based on testing data. To address the aforementioned issues in urban flood mapping, we developed a patch similarity convolutional neural network (PSNet) using satellite multispectral surface reflectance imagery before and after flooding with a spatial resolution of 3 meters. We used spectral reflectance instead of raw pixel DNs so that the influence of inconsistent illumination caused by varied weather conditions at the time of data collection can be greatly reduced. Such consistent spectral reflectance data also enhance the generalization capability of the proposed model. Experiments on the high resolution imagery before and after the urban flooding events (i.e., the 2017 Hurricane Harvey and the 2018 Hurricane Florence) showed that the developed PSNet can produce urban flood maps with consistently high precision, recall, F1 score, and overall accuracy compared with baseline classification models including support vector machine, decision tree, random forest, and AdaBoost, which were often poor in either precision or recall. The study paves the way to fuse bi-temporal remote sensing images for near real-time precision damage mapping associated with other types of natural hazards (e.g., wildfires and earthquakes). 
    more » « less
  2. Abstract

    Flood nowcasting refers to near-future prediction of flood status as an extreme weather event unfolds to enhance situational awareness. The objective of this study was to adopt and test a novel structured deep-learning model for urban flood nowcasting by integrating physics-based and human-sensed features. We present a new computational modeling framework including an attention-based spatial–temporal graph convolution network (ASTGCN) model and different streams of data that are collected in real-time, preprocessed, and fed into the model to consider spatial and temporal information and dependencies that improve flood nowcasting. The novelty of the computational modeling framework is threefold: first, the model is capable of considering spatial and temporal dependencies in inundation propagation thanks to the spatial and temporal graph convolutional modules; second, it enables capturing the influence of heterogeneous temporal data streams that can signal flooding status, including physics-based features (e.g., rainfall intensity and water elevation) and human-sensed data (e.g., residents’ flood reports and fluctuations of human activity) on flood nowcasting. Third, its attention mechanism enables the model to direct its focus to the most influential features that vary dynamically and influence the flood nowcasting. We show the application of the modeling framework in the context of Harris County, Texas, as the study area and 2017 Hurricane Harvey as the flood event. Three categories of features are used for nowcasting the extent of flood inundation in different census tracts: (i) static features that capture spatial characteristics of various locations and influence their flood status similarity, (ii) physics-based dynamic features that capture changes in hydrodynamic variables, and (iii) heterogeneous human-sensed dynamic features that capture various aspects of residents’ activities that can provide information regarding flood status. Results indicate that the ASTGCN model provides superior performance for nowcasting of urban flood inundation at the census-tract level, with precision 0.808 and recall 0.891, which shows the model performs better compared with other state-of-the-art models. Moreover, ASTGCN model performance improves when heterogeneous dynamic features are added into the model that solely relies on physics-based features, which demonstrates the promise of using heterogenous human-sensed data for flood nowcasting. Given the results of the comparisons of the models, the proposed modeling framework has the potential to be more investigated when more data of historical events are available in order to develop a predictive tool to provide community responders with an enhanced prediction of the flood inundation during urban flood.

     
    more » « less
  3. Abstract

    Smart resilience is the beneficial result of the collision course of the fields of data science and urban resilience to flooding. The objective of this study is to propose and demonstrate a smart flood resilience framework that leverages heterogeneous community-scale big data and infrastructure sensor data to enhance predictive risk monitoring and situational awareness. The smart flood resilience framework focuses on four core capabilities that could be augmented by the use of heterogeneous community-scale big data and analytics techniques: (1) predictive flood risk mapping; (2) automated rapid impact assessment; (3) predictive infrastructure failure prediction and monitoring; and (4) smart situational awareness capabilities. We demonstrate the components of these core capabilities of the smart flood resilience framework in the context of the 2017 Hurricane Harvey in Harris County, Texas. First, we present the use of flood sensors for the prediction of floodwater overflow in channel networks and inundation of co-located road networks. Second, we discuss the use of social media and machine learning techniques for assessing the impacts of floods on communities and sensing emotion signals to examine societal impacts. Third, we describe the use of high-resolution traffic data in network-theoretic models for nowcasting of flood propagation on road networks and the disrupted access to critical facilities, such as hospitals. Fourth, we introduce how location-based and credit card transaction data were used in spatial analyses to proactively evaluate the recovery of communities and the impacts of floods on businesses. These analyses show that the significance of core capabilities of the smart flood resilience framework in helping emergency managers, city planners, public officials, responders, and volunteers to better cope with the impacts of catastrophic flooding events.

     
    more » « less
  4. Abstract

    Lifestyle recovery captures the collective effects of population activities as well as the restoration of infrastructure and business services. This study uses a novel approach to leverage privacy-enhanced location intelligence data, which is anonymized and aggregated, to characterize distinctive lifestyle patterns and to unveil recovery trajectories after 2017 Hurricane Harvey in Harris County, Texas (USA). The analysis integrates multiple data sources to record the number of visits from home census block groups (CBGs) to different points of interest (POIs) in the county during the baseline and disaster periods. For the methodology, the research utilizes unsupervised machine learning and ANOVA statistical testing to characterize the recovery of lifestyles using privacy-enhanced location intelligence data. First, primary clustering using k-means characterized four distinct essential and non-essential lifestyle patterns. For each primary lifestyle cluster, the secondary clustering characterized the impact of the hurricane into four possible recovery trajectories based on the severity of maximum disruption and duration of recovery. The findings further reveal multiple recovery trajectories and durations within each lifestyle cluster, which imply differential recovery rates among similar lifestyles and different demographic groups. The impact of flooding on lifestyle recovery extends beyond the flooded regions, as 59% of CBGs with extreme recovery durations did not have at least 1% of direct flooding impacts. The findings offer a twofold theoretical significance: (1) lifestyle recovery is a critical milestone that needs to be examined, quantified, and monitored in the aftermath of disasters; (2) spatial structures of cities formed by human mobility and distribution of facilities extend the spatial reach of flood impacts on population lifestyles. These provide novel data-driven insights for public officials and emergency managers to examine, measure, and monitor a critical milestone in community recovery trajectory based on the return of lifestyles to normalcy.

     
    more » « less
  5. Research on the societal costs of disaster-related US school closures has focused, and due to COVID-19 will likely continue to focus, on pandemics, with very limited research on closures from natural hazards. This is surprising given that school closures occur frequently to protect children, teachers, and staff pre-event as well as post-disaster to convert facilities into emergency shelters, etc. This study investigates the secondary effects from post-flooding, temporary school closures after the catastrophic 2015 flash flood in Richland County, South Carolina. Lost productivity from school closures was quantified using the Human Capital Method. Out of the 208 completed surveys, 75% of households had children that missed school. Post-stratifying survey results on race produced an average of $437 in lost productivity due to school closures and an overall $2175 in lost productivity due to flood-related impacts in general. Expressed in FEMA benefit-cost analysis terms, our study shows that schools have a standard value of $215 per household and per day for the unweighted sample ($180 for the race-weighted sample). Furthermore, households' disutility for a late start is almost as great as their disutility for a school closure. These exploratory findings suggest that unplanned school closures should be minimized, and community characteristics carefully considered to avoid unintended socioeconomic consequences. 
    more » « less