skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Utilizing big data without domain knowledge impacts public health decision-making
New data sources and AI methods for extracting information are increasingly abundant and relevant to decision-making across societal applications. A notable example is street view imagery, available in over 100 countries, and purported to inform built environment interventions (e.g., adding sidewalks) for community health outcomes. However, biases can arise when decision-making does not account for data robustness or relies on spurious correlations. To investigate this risk, we analyzed 2.02 million Google Street View (GSV) images alongside health, demographic, and socioeconomic data from New York City. Findings demonstrate robustness challenges; built environment characteristics inferred from GSV labels at the intracity level often do not align with ground truth. Moreover, as average individual-level behavior of physical inactivity significantly mediates the impact of built environment features by census tract, intervention on features measured by GSV would be misestimated without proper model specification and consideration of this mediation mechanism. Using a causal framework accounting for these mediators, we determined that intervening by improving 10% of samples in the two lowest tertiles of physical inactivity would lead to a 4.17 (95% CI 3.84–4.55) or 17.2 (95% CI 14.4–21.3) times greater decrease in the prevalence of obesity or diabetes, respectively, compared to the same proportional intervention on the number of crosswalks by census tract. This study highlights critical issues of robustness and model specification in using emergent data sources, showing the data may not measure what is intended, and ignoring mediators can result in biased intervention effect estimates.  more » « less
Award ID(s):
1845487
PAR ID:
10542522
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Proceedings of the National Academy of Sciences
Date Published:
Journal Name:
Proceedings of the National Academy of Sciences
Volume:
121
Issue:
39
ISSN:
0027-8424
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Objectives. To examine the relationships among environmental characteristics, temperature, and health outcomes during heat advisories at the geographic scale of street segments. Methods. We combined multiple data sets from Boston, Massachusetts, including remotely sensed measures of temperature and associated environmental characteristics (e.g., canopy cover), 911 dispatches for medical emergencies, daily weather conditions, and demographic and physical context from the American Community Survey and City of Boston Property Assessments. We used multilevel models to analyze the distribution of land surface temperature and elevated vulnerability during heat advisories across streets and neighborhoods. Results. A substantial proportion of variation in land surface temperature existed between streets within census tracts (38%), explained by canopy, impervious surface, and albedo. Streets with higher land surface temperature had a greater likelihood of medical emergencies during heat advisories relative to the frequency of medical emergencies during non–heat advisory periods. There was no independent effect of the average land surface temperature of the census tract. Conclusions. The relationships among environmental characteristics, temperature, and health outcomes operate at the spatial scale of the street segment, calling for more geographically precise analysis and intervention. 
    more » « less
  2. Traffic forecasting plays an important role in urban planning. Deep learning methods outperform traditional traffic flow forecasting models because of their ability to capture spatiotemporal characteristics of traffic conditions. However, these methods require high-quality historical traffic data, which can be both difficult to acquire and non-comprehensive, making it hard to predict traffic flows at the city scale. To resolve this problem, we implemented a deep learning method, SceneGCN, to forecast traffic speed at the city scale. The model involves two steps: firstly, scene features are extracted from Google Street View (GSV) images for each road segment using pretrained Resnet18 models. Then, the extracted features are entered into a graph convolutional neural network to predict traffic speed at different hours of the day. Our results show that the accuracy of the model can reach up to 86.5% and the Resnet18 model pretrained by Places365 is the best choice to extract scene features for traffic forecasting tasks. Finally, we conclude that the proposed model can predict traffic speed efficiently at the city scale and GSV images have the potential to capture information about human activities. 
    more » « less
  3. Abstract Street view imagery databases such as Google Street View, Mapillary, and Karta View provide great spatial and temporal coverage for many cities globally. Those data, when coupled with appropriate computer vision algorithms, can provide an effective means to analyse aspects of the urban environment at scale. As an effort to enhance current practices in urban flood risk assessment, this project investigates a potential use of street view imagery data to identify building features that indicate buildings’ vulnerability to flooding (e.g., basements and semi-basements). In particular, this paper discusses (1) building features indicating the presence of basement structures, (2) available imagery data sources capturing those features, and (3) computer vision algorithms capable of automatically detecting the features of interest. The paper also reviews existing methods for reconstructing geometry representations of the extracted features from images and potential approaches to account for data quality issues. Preliminary experiments were conducted, which confirmed the usability of the freely available Mapillary images for detecting basement railings as an example type of basement features, as well as geolocating the features. 
    more » « less
  4. ABSTRACT Urban flooding is an increasing threat to cities and resident well‐being. The Federal Emergency Management Agency (FEMA) typically reports losses attributed to flooding which result from a stream overtopping its banks, discounting impacts of higher frequency, lower impact flooding that occurs when precipitation intensity exceeds the capacity of a drainage system. Despite its importance, the drivers of street flooding can often be difficult to identify, given street flooding data scarcity and the multitude of storm, built environment, and social factors involved. To address this knowledge gap, this study uses 922 street flooding reports to the city in Denver, Colorado, USA from 2000 to 2019 in coordination with rain gauge network data and Census tract information to improve understanding of spatiotemporal drivers of urban flooding. An initial threshold analysis using rainfall intensity to predict street flooding had performance close to random chance, which led us to investigate other drivers. A logistic regression describing the probability of a storm leading to a flood report showed the strongest predictors of urban flooding were, in descending order, maximum 5‐min rainfall intensity, population density, storm depth, storm duration, median tract income, and stormwater pipe density. The logistic regression also showed that rainfall intensity and population density are nearly as important in determining the likelihood of a flood report incidence. In addition, topographic wetness index values at locations of flooding reports were higher than randomly selected points. A linear regression predicting the number of reports per area identified percent impervious as the single most important predictor. Our methodologies can be used to better inform urban flood awareness, response, and mitigation and are applicable to any city with flood reports and spatial precipitation data. 
    more » « less
  5. Santiago, J. (Ed.)
    The storefront accessibility can substantially impact the way people who are blind or visually impaired (BVI) travel in urban environments. Entrance localization is one of the biggest challenges to the BVI people. In addition, improperly designed staircases and obstructive store decorations can create considerable mobility challenges for BVI people, making it more difficult for them to navigate their community hence reducing their desire to travel. Unfortunately, there are few approaches to acquiring this information in advance through computational tools or services. In this paper, we propose a solution to collect large- scale accessibility data of New York City (NYC) storefronts using a crowdsourcing approach on Google Street View (GSV) panoramas. We develop a web-based crowdsourcing application, DoorFront, which enables volunteers not only to remotely label storefront accessibility data on GSV images, but also to validate the labeling result to ensure high data quality. In order to study the usability and user experience of our application, an informal beta-test is conducted and a user experience survey is designed for testing volunteers. The user feedback is very positive and indicates the high potential and usability of the proposed application. 
    more » « less