skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, January 16 until 2:00 AM ET on Friday, January 17 due to maintenance. We apologize for the inconvenience.


Title: Understanding pedestrian movement using urban sensing technologies: the promise of audio-based sensors
Abstract

While various sensors have been deployed to monitor vehicular flows, sensing pedestrian movement is still nascent. Yet walking is a significant mode of travel in many cities, especially those in Europe, Africa, and Asia. Understanding pedestrian volumes and flows is essential for designing safer and more attractive pedestrian infrastructure and for controlling periodic overcrowding. This study discusses a new approach to scale up urban sensing of people with the help of novel audio-based technology. It assesses the benefits and limitations of microphone-based sensors as compared to other forms of pedestrian sensing. A large-scale dataset called ASPED is presented, which includes high-quality audio recordings along with video recordings used for labeling the pedestrian count data. The baseline analyses highlight the promise of using audio sensors for pedestrian tracking, although algorithmic and technological improvements to make the sensors practically usable continue. This study also demonstrates how the data can be leveraged to predict pedestrian trajectories. Finally, it discusses the use cases and scenarios where audio-based pedestrian sensing can support better urban and transportation planning.

 
more » « less
PAR ID:
10522482
Author(s) / Creator(s):
; ; ; ; ; ; ;
Publisher / Repository:
Springer Science + Business Media
Date Published:
Journal Name:
Urban Informatics
Volume:
3
Issue:
1
ISSN:
2731-6963
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Access to high-quality data is an important barrier in the digital analysis of urban settings, including applications within computer vision and urban design. Diverse forms of data collected from sensors in areas of high activity in the urban environment, particularly at street intersections, are valuable resources for researchers interpreting the dynamics between vehicles, pedestrians, and the built environment. In this paper, we present a high-resolution audio, video, and LiDAR dataset of three urban intersections in Brooklyn, New York, totaling almost 8 unique hours. The data were collected with custom Reconfigurable Environmental Intelligence Platform (REIP) sensors that were designed with the ability to accurately synchronize multiple video and audio inputs. The resulting data are novel in that they are inclusively multimodal, multi-angular, high-resolution, and synchronized. We demonstrate four ways the data could be utilized — (1) to discover and locate occluded objects using multiple sensors and modalities, (2) to associate audio events with their respective visual representations using both video and audio modes, (3) to track the amount of each type of object in a scene over time, and (4) to measure pedestrian speed using multiple synchronized camera views. In addition to these use cases, our data are available for other researchers to carry out analyses related to applying machine learning to understanding the urban environment (in which existing datasets may be inadequate), such as pedestrian-vehicle interaction modeling and pedestrian attribute recognition. Such analyses can help inform decisions made in the context of urban sensing and smart cities, including accessibility-aware urban design and Vision Zero initiatives.

     
    more » « less
  2. ABSTRACT

    Distributed acoustic sensing (DAS) technology is an emerging field of seismic sensing that enables recording ambient noise seismic data along the entire length of a fiber-optic cable at meter-scale resolution. Such a dense spatial resolution of recordings over long distances has not been possible using traditional methods because of limited hardware resources and logistical concerns in an urban environment. The low spatial resolution of traditional passive seismic acquisition techniques has limited the accuracy of the previously generated velocity profiles in many important urban regions, including the Reno-area basin, to the top 100 m of the underlying subsurface. Applying the method of seismic interferometry to ambient noise strain rate data obtained from a dark-fiber cable allows for generating noise cross correlations, which can be used to infer shallow and deep subsurface properties and basin geometry. We gathered DAS ambient noise seismic data for this study using a 12 km portion of a dark-fiber line in Reno, Nevada. We used gathered data to generate and invert dispersion curves to estimate the near-surface shear-wave velocity structure. Comparing the generated velocity profiles with previous regional studies shows good agreement in determining the average depth to bedrock and velocity variations in the analyzed domain. A synthetic experiment is also performed to verify the proposed framework further and better understand the effect of the infrastructural cover along the cable. The results obtained from this research provide insight into the application of DAS using dark-fiber lines in subsurface characterization in urban environments. It also discusses the potential effects of the conduit that covers such permanent fiber installations on the produced inversion results.

     
    more » « less
  3. Video cameras in smart cities can be used to provide data to improve pedestrian safety and traffic management. Video recordings inherently violate privacy, and technological solutions need to be found to preserve it. Smart city applications deployed on top of the COSMOS research testbed in New York City are envisioned to be privacy friendly. This contribution presents one approach to privacy preservation– a video anonymization pipeline implemented in the form of blurring of pedestrian faces and vehicle license plates. The pipeline utilizes customized deeplearning models based on YOLOv4 for detection of privacysensitive objects in street-level video recordings. To achieve real time inference, the pipeline includes speed improvements via NVIDIA TensorRT optimization. When applied to the video dataset acquired at an intersection within the COSMOS testbed in New York City, the proposed method anonymizes visible faces and license plates with recall of up to 99% and inference speed faster than 100 frames per second. The results of a comprehensive evaluation study are presented. A selection of anonymized videos can be accessed via the COSMOS testbed portal. Index Terms—Smart City, Sensors, Video Surveillance, Privacy Protection, Object Detection, Deep Learning, TensorRT. 
    more » « less
  4. Video cameras in smart cities can be used to provide data to improve pedestrian safety and traffic management. Video recordings inherently violate privacy, and technological solutions need to be found to preserve it. Smart city applications deployed on top of the COSMOS research testbed in New York City are envisioned to be privacy friendly. This contribution presents one approach to privacy preservation – a video anonymization pipeline implemented in the form of blurring of pedestrian faces and vehicle license plates. The pipeline utilizes customized deeplearning models based on YOLOv4 for detection of privacysensitive objects in street-level video recordings. To achieve real time inference, the pipeline includes speed improvements via NVIDIA TensorRT optimization. When applied to the video dataset acquired at an intersection within the COSMOS testbed in New York City, the proposed method anonymizes visible faces and license plates with recall of up to 99% and inference speed faster than 100 frames per second. The results of a comprehensive evaluation study are presented. A selection of anonymized videos can be accessed via the COSMOS testbed portal. Index Terms—Smart City, Sensors, Video Surveillance, Privacy Protection, Object Detection, Deep Learning, TensorRT. 
    more » « less
  5. Abstract

    This study evaluates whether early vocalizations develop in similar ways in children across diverse cultural contexts. We analyze data from daylong audio recordings of 49 children (1–36 months) from five different language/cultural backgrounds. Citizen scientists annotated these recordings to determine if child vocalizations contained canonical transitions or not (e.g., “ba” vs. “ee”). Results revealed that the proportion of clips reported to contain canonical transitions increased with age. Furthermore, this proportion exceeded 0.15 by around 7 months, replicating and extending previous findings on canonical vocalization development but using data from the natural environments of a culturally and linguistically diverse sample. This work explores how crowdsourcing can be used to annotate corpora, helping establish developmental milestones relevant to multiple languages and cultures. Lower inter‐annotator reliability on the crowdsourcing platform, relative to more traditional in‐lab expert annotators, means that a larger number of unique annotators and/or annotations are required, and that crowdsourcing may not be a suitable method for more fine‐grained annotation decisions. Audio clips used for this project are compiled into a large‐scale infant vocalization corpus that is available for other researchers to use in future work.

     
    more » « less