skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: CSafe: An Intelligent Audio Wearable Platform for Improving Construction Worker Safety in Urban Environments
Vehicle accidents are one of the greatest cause of death and injury in urban areas for pedestrians, workers, and police alike. In this work, we present CSafe, a low power audio-wearable platform that detects, localizes, and provides alerts about oncoming vehicles to improve construction worker safety. Construction worker safety is a much more challenging problem than general urban or pedestrian safety in that the sound of construction tools can be up to orders of magnitude greater than that of vehicles, making vehicle detection and localization exceptionally difficult. To overcome these challenges, we develop a novel sound source separation algorithm, called Probabilistic Template Matching (PTM), as well as a novel noise filtering architecture to remove loud construction noises from our observed signals. We show that our architecture can improve vehicle detection by up to 12% over other state-of-art source separation algorithms. We integrate PTM and our noise filtering architecture into CSafe and show through a series of real-world experiments that CSafe can achieve up to an 82% vehicle detection rate and a 6.90° mean localization error in acoustically noisy construction site scenarios, which is 16% higher and almost 30° lower than the state-of-art audio wearable safety works.  more » « less
Award ID(s):
1704899 1815274
PAR ID:
10230890
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 20th International Conference on Information Processing in Sensor Networks
Page Range / eLocation ID:
207 to 221
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. With the prevalence of smartphones, pedestrians and joggers today often walk or run while listening to music. Since they are deprived of their auditory senses that would have provided important cues to dangers, they are at a much greater risk of being hit by cars or other vehicles. In this paper, we build a wearable system that uses multi-channel audio sensors embedded in a headset to help detect and locate cars from their honks, engine and tire noises, and warn pedestrians of imminent dangers of approaching cars. We demonstrate that using a segmented architecture consisting of headset-mounted audio sensors, a front-end hardware platform that performs signal processing and feature extraction, and machine learning based classification on a smartphone, we are able to provide early danger detection in real-time, from up to 60m away, and alert the user with low latency and high accuracy. To further reduce power consumption of the battery-powered wearable headset, we implement a custom-designed integrated circuit that is able to compute delays between multiple channels of audio with nW power consumption. A regression-based method for sound source localization, AvPR, is proposed and used in combination with the IC to improve the granularity and robustness of localization. 
    more » « less
  2. With the prevalence of smartphones, pedestrians and joggers today often walk or run while listening to music. Since they are deprived of their auditory senses that would have provided important cues to dangers, they are at a much greater risk of being hit by cars or other vehicles. In this paper, we build a wearable system that uses multi-channel audio sensors embedded in a headset to help detect and locate cars from their honks, engine and tire noises, and warn pedestrians of imminent dangers of approaching cars. We demonstrate that using a segmented architecture and implementation consisting of headset-mounted audio sensors, a front-end hardware that performs signal processing and feature extraction, and machine learning based classification on a smartphone, we are able to provide early danger detection in real-time, from up to 60m distance, near 100% precision on the vehicle detection and alert the user with low latency. 
    more » « less
  3. The joint analysis of audio and video is a powerful tool that can be applied to various contexts, including action, speech, and sound recognition, audio-visual video parsing, emotion recognition in affective computing, and self-supervised training of deep learning models. Solving these problems often involves tackling core audio-visual tasks, such as audio-visual source localization, audio-visual correspondence, and audio-visual source separation, which can be combined in various ways to achieve the desired results. This paper provides a review of the literature in this area, discussing the advancements, history, and datasets of audio-visual learning methods for various application domains. It also presents an overview of the reported performances on standard datasets and suggests promising directions for future research. 
    more » « less
  4. As augmented and virtual reality (AR/VR) technology matures, a method is desired to represent real-world persons visually and aurally in a virtual scene with high fidelity to craft an immersive and realistic user experience. Current technologies leverage camera and depth sensors to render visual representations of subjects through avatars, and microphone arrays are employed to localize and separate high-quality subject audio through beamforming. However, challenges remain in both realms. In the visual domain, avatars can only map key features (e.g., pose, expression) to a predetermined model, rendering them incapable of capturing the subjects’ full details. Alternatively, high-resolution point clouds can be utilized to represent human subjects. However, such three-dimensional data is computationally expensive to process. In the realm of audio, sound source separation requires prior knowledge of the subjects’ locations. However, it may take unacceptably long for sound source localization algorithms to provide this knowledge, which can still be error-prone, especially with moving objects. These challenges make it difficult for AR systems to produce real-time, high-fidelity representations of human subjects for applications such as AR/VR conferencing that mandate negligible system latency. We present Acuity, a real-time system capable of creating high-fidelity representations of human subjects in a virtual scene both visually and aurally. Acuity isolates subjects from high-resolution input point clouds. It reduces the processing overhead by performing background subtraction at a coarse resolution, then applying the detected bounding boxes to fine-grained point clouds. Meanwhile, Acuity leverages an audiovisual sensor fusion approach to expedite sound source separation. The estimated object location in the visual domain guides the acoustic pipeline to isolate the subjects’ voices without running sound source localization. Our results demonstrate that Acuity can isolate multiple subjects’ high-quality point clouds with a maximum latency of 70 ms and average throughput of over 25 fps, while separating audio in less than 30 ms. We provide the source code of Acuity at: https://github.com/nesl/Acuity. 
    more » « less
  5. The operational safety of Automated Driving System (ADS)-Operated Vehicles (AVs) are a rising concern with the deployment of AVs as prototypes being tested and also in commercial deployment. The robustness of safety evaluation systems is essential in determining the operational safety of AVs as they interact with human-driven vehicles. Extending upon earlier works of the Institute of Automated Mobility (IAM) that have explored the Operational Safety Assessment (OSA) metrics and infrastructure-based safety monitoring systems, in this work, we compare the performance of an infrastructure-based Light Detection And Ranging (LIDAR) system to an onboard vehicle-based LIDAR system in testing at the Maricopa County Department of Transportation SMARTDrive testbed in Anthem, Arizona. The sensor modalities are located in infrastructure and onboard the test vehicles, including LIDAR, cameras, a real-time differential GPS, and a drone with a camera. Bespoke localization and tracking algorithms are created for the LIDAR and cameras. In total, there are 26 different scenarios of the test vehicles navigating the testbed intersection; for this work, we are only considering car following scenarios. The LIDAR data collected from the infrastructure-based and onboard vehicle-based sensors system are used to perform object detection and multi-target tracking to estimate the velocity and position information of the test vehicles and use these values to compute OSA metrics. The comparison of the performance of the two systems involves the localization and tracking errors in calculating the position and the velocity of the subject vehicle, with the real-time differential GPS data serving as ground truth for velocity comparison and tracking results from the drone for OSA metrics comparison. 
    more » « less