skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, November 14 until 2:00 AM ET on Saturday, November 15 due to maintenance. We apologize for the inconvenience.


Title: Forecasting Traffic Speed during Daytime from Google Street View Images using Deep Learning
Traffic forecasting plays an important role in urban planning. Deep learning methods outperform traditional traffic flow forecasting models because of their ability to capture spatiotemporal characteristics of traffic conditions. However, these methods require high-quality historical traffic data, which can be both difficult to acquire and non-comprehensive, making it hard to predict traffic flows at the city scale. To resolve this problem, we implemented a deep learning method, SceneGCN, to forecast traffic speed at the city scale. The model involves two steps: firstly, scene features are extracted from Google Street View (GSV) images for each road segment using pretrained Resnet18 models. Then, the extracted features are entered into a graph convolutional neural network to predict traffic speed at different hours of the day. Our results show that the accuracy of the model can reach up to 86.5% and the Resnet18 model pretrained by Places365 is the best choice to extract scene features for traffic forecasting tasks. Finally, we conclude that the proposed model can predict traffic speed efficiently at the city scale and GSV images have the potential to capture information about human activities.  more » « less
Award ID(s):
1952193
PAR ID:
10451287
Author(s) / Creator(s):
;
Date Published:
Journal Name:
Transportation Research Record: Journal of the Transportation Research Board
ISSN:
0361-1981
Page Range / eLocation ID:
036119812311695
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Underwater imaging enables nondestructive plankton sampling at frequencies, durations, and resolutions unattainable by traditional methods. These systems necessitate automated processes to identify organisms efficiently. Early underwater image processing used a standard approach: binarizing images to segment targets, then integrating deep learning models for classification. While intuitive, this infrastructure has limitations in handling high concentrations of biotic and abiotic particles, rapid changes in dominant taxa, and highly variable target sizes. To address these challenges, we introduce a new framework that starts with a scene classifier to capture large within‐image variation, such as disparities in the layout of particles and dominant taxa. After scene classification, scene‐specific Mask regional convolutional neural network (Mask R‐CNN) models are trained to separate target objects into different groups. The procedure allows information to be extracted from different image types, while minimizing potential bias for commonly occurring features. Using in situ coastal plankton images, we compared the scene‐specific models to the Mask R‐CNN model encompassing all scene categories as a single full model. Results showed that the scene‐specific approach outperformed the full model by achieving a 20% accuracy improvement in complex noisy images. The full model yielded counts that were up to 78% lower than those enumerated by the scene‐specific model for some small‐sized plankton groups. We further tested the framework on images from a benthic video camera and an imaging sonar system with good results. The integration of scene classification, which groups similar images together, can improve the accuracy of detection and classification for complex marine biological images. 
    more » « less
  2. This study proposes a data fusion and deep learning (DL) framework that learns high-level traffic features from network-level images to predict large-scale, multi-route, speed and volume of connected vehicles (CVs). We present a scalable and parallel method of processing statewide CVs’ trajectory data that leads to real-time insights on the micro-scale in time and space (two-dimensional (2D) arrays) on graphics processing unit (GPUs) using the Nvidia rapids framework and dask parallel cluster, which provided a 50× speed-up in the data extraction, transform and load (ETL). A UNet model is then applied to perform feature extraction and multi-route speed and volume channels over a multi-step prediction horizon. The accuracy and robustness of the proposed model are evaluated by taking different road types, times of day and image snippets and comparing the model to benchmarks: Convolutional Long–Short-Term Memory (ConvLSTM) and a historical average (HA). The results show that the proposed model outperforms benchmarks with an average improvement of 15% over ConvLSTM and 65% over the HA. Comparing the image snippets from each prediction model to the actual image shows that image textures were highly similar in UNet to the benchmark models used. UNet’s dominance in performing image predictions was also evident in multi-step forecasting, where the increase in errors was relatively minimal over longer prediction horizons. 
    more » « less
  3. null (Ed.)
    Deep networks are often not scale-invariant hence their performance can vary wildly if recognizable objects are at an unseen scale occurring only at testing time. In this paper, we propose ScaleNet, which recursively predicts object scale in a deep learning framework. With an explicit objective to predict the scale of objects in images, ScaleNet enables pretrained deep learning models to identify objects in the scales that are not present in their training sets. By recursively calling ScaleNet, one can generalize to very large scale changes unseen in the training set. To demonstrate the robustness of our proposed framework, we conduct experiments with pretrained as well as fine-tuned classification and detection frameworks on MNIST, CIFAR-10, and MS COCO datasets and results reveal that our proposed framework significantly boosts the performances of deep networks. 
    more » « less
  4. Traffic intersections are prime locations for deployment of infrastructure sensors and edge computing nodes to realize the vision of a smart city. It is expected that the needs of a smart city, in regards to traffic and pedestrian traffic systems monitored by cameras/video, can be met by using stateof-the-art artificial-intelligence (AI) based object detectors and trackers. A critical component in designing an effective real-time object detection/tracking pipeline is the understanding of how object density, i.e., the number of objects in a scene, and imageresolution and frame rate influence the performance metrics. This study explores the accuracy and speed metrics with the goal of supporting pipelines that meet the precision and latency needs of a real-time environment. We examine the impact of varying image-resolution, frame rate and object-density on the object detection performance metrics. The experiments on the COSMOS testbed dataset show that varying the frame width from 416 pixels to 832 pixels, and cropping the images to a square resolution, result in the increase in average precision for all object classes. Decreasing the frame rate from 15 fps to 5 fps preserves more than 90% of the highest F1 score achieved for all object classes. The results inform the choice of video preprocessing stages, modifications to established AI-based object detection/tracking methods, and suggest optimal hyper-parameter values. Index Terms—Object Detection, Smart City, Video Resolution, Deep Learning Models. 
    more » « less
  5. New data sources and AI methods for extracting information are increasingly abundant and relevant to decision-making across societal applications. A notable example is street view imagery, available in over 100 countries, and purported to inform built environment interventions (e.g., adding sidewalks) for community health outcomes. However, biases can arise when decision-making does not account for data robustness or relies on spurious correlations. To investigate this risk, we analyzed 2.02 million Google Street View (GSV) images alongside health, demographic, and socioeconomic data from New York City. Findings demonstrate robustness challenges; built environment characteristics inferred from GSV labels at the intracity level often do not align with ground truth. Moreover, as average individual-level behavior of physical inactivity significantly mediates the impact of built environment features by census tract, intervention on features measured by GSV would be misestimated without proper model specification and consideration of this mediation mechanism. Using a causal framework accounting for these mediators, we determined that intervening by improving 10% of samples in the two lowest tertiles of physical inactivity would lead to a 4.17 (95% CI 3.84–4.55) or 17.2 (95% CI 14.4–21.3) times greater decrease in the prevalence of obesity or diabetes, respectively, compared to the same proportional intervention on the number of crosswalks by census tract. This study highlights critical issues of robustness and model specification in using emergent data sources, showing the data may not measure what is intended, and ignoring mediators can result in biased intervention effect estimates. 
    more » « less