skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: CSafe: An Intelligent Audio Wearable Platform for Improving Construction Worker Safety in Urban Environments
Vehicle accidents are one of the greatest cause of death and injury in urban areas for pedestrians, workers, and police alike. In this work, we present CSafe, a low power audio-wearable platform that detects, localizes, and provides alerts about oncoming vehicles to improve construction worker safety. Construction worker safety is a much more challenging problem than general urban or pedestrian safety in that the sound of construction tools can be up to orders of magnitude greater than that of vehicles, making vehicle detection and localization exceptionally difficult. To overcome these challenges, we develop a novel sound source separation algorithm, called Probabilistic Template Matching (PTM), as well as a novel noise filtering architecture to remove loud construction noises from our observed signals. We show that our architecture can improve vehicle detection by up to 12% over other state-of-art source separation algorithms. We integrate PTM and our noise filtering architecture into CSafe and show through a series of real-world experiments that CSafe can achieve up to an 82% vehicle detection rate and a 6.90° mean localization error in acoustically noisy construction site scenarios, which is 16% higher and almost 30° lower than the state-of-art audio wearable safety works.  more » « less
Award ID(s):
1704899 1815274
PAR ID:
10230890
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 20th International Conference on Information Processing in Sensor Networks
Page Range / eLocation ID:
207 to 221
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. With the prevalence of smartphones, pedestrians and joggers today often walk or run while listening to music. Since they are deprived of their auditory senses that would have provided important cues to dangers, they are at a much greater risk of being hit by cars or other vehicles. In this paper, we build a wearable system that uses multi-channel audio sensors embedded in a headset to help detect and locate cars from their honks, engine and tire noises, and warn pedestrians of imminent dangers of approaching cars. We demonstrate that using a segmented architecture consisting of headset-mounted audio sensors, a front-end hardware platform that performs signal processing and feature extraction, and machine learning based classification on a smartphone, we are able to provide early danger detection in real-time, from up to 60m away, and alert the user with low latency and high accuracy. To further reduce power consumption of the battery-powered wearable headset, we implement a custom-designed integrated circuit that is able to compute delays between multiple channels of audio with nW power consumption. A regression-based method for sound source localization, AvPR, is proposed and used in combination with the IC to improve the granularity and robustness of localization. 
    more » « less
  2. With the prevalence of smartphones, pedestrians and joggers today often walk or run while listening to music. Since they are deprived of their auditory senses that would have provided important cues to dangers, they are at a much greater risk of being hit by cars or other vehicles. In this paper, we build a wearable system that uses multi-channel audio sensors embedded in a headset to help detect and locate cars from their honks, engine and tire noises, and warn pedestrians of imminent dangers of approaching cars. We demonstrate that using a segmented architecture and implementation consisting of headset-mounted audio sensors, a front-end hardware that performs signal processing and feature extraction, and machine learning based classification on a smartphone, we are able to provide early danger detection in real-time, from up to 60m distance, near 100% precision on the vehicle detection and alert the user with low latency. 
    more » « less
  3. The joint analysis of audio and video is a powerful tool that can be applied to various contexts, including action, speech, and sound recognition, audio-visual video parsing, emotion recognition in affective computing, and self-supervised training of deep learning models. Solving these problems often involves tackling core audio-visual tasks, such as audio-visual source localization, audio-visual correspondence, and audio-visual source separation, which can be combined in various ways to achieve the desired results. This paper provides a review of the literature in this area, discussing the advancements, history, and datasets of audio-visual learning methods for various application domains. It also presents an overview of the reported performances on standard datasets and suggests promising directions for future research. 
    more » « less
  4. Combustion vehicle emissions contribute to poor air quality and release greenhouse gases into the atmosphere, and vehicle pollution has been associated with numerous adverse health effects. Roadways with extensive waiting and/or passenger drop-off, such as schools and hospital drop-off zones, can result in a high incidence and density of idling vehicles. This can produce micro-climates of increased vehicle pollution. Thus, the detection of idling vehicles can be helpful in monitoring and responding to unnecessary idling and be integrated into real-time or off-line systems to address the resulting pollution. In this paper, we present a real-time, dynamic vehicle idling detection algorithm. The proposed idle detection algorithm and notification rely on an algorithm to detect these idling vehicles. The proposed method relies on a multisensor, audio-visual, machine-learning workflow to detect idling vehicles visually under three conditions: moving, static with the engine on, and static with the engine off. The visual vehicle motion detector is built in the first stage, and then a contrastive-learning-based latent space is trained for classifying static vehicle engine sound. We test our system in real-time at a hospital drop-off point in Salt Lake City. This in situ dataset was collected and annotated, and it includes vehicles of varying models and types. The experiments show that the method can detect engine switching on or off instantly and achieves 71.02 average precision (AP) for idle detection and 91.06 for engine off detection. 
    more » « less
  5. As augmented and virtual reality (AR/VR) technology matures, a method is desired to represent real-world persons visually and aurally in a virtual scene with high fidelity to craft an immersive and realistic user experience. Current technologies leverage camera and depth sensors to render visual representations of subjects through avatars, and microphone arrays are employed to localize and separate high-quality subject audio through beamforming. However, challenges remain in both realms. In the visual domain, avatars can only map key features (e.g., pose, expression) to a predetermined model, rendering them incapable of capturing the subjects’ full details. Alternatively, high-resolution point clouds can be utilized to represent human subjects. However, such three-dimensional data is computationally expensive to process. In the realm of audio, sound source separation requires prior knowledge of the subjects’ locations. However, it may take unacceptably long for sound source localization algorithms to provide this knowledge, which can still be error-prone, especially with moving objects. These challenges make it difficult for AR systems to produce real-time, high-fidelity representations of human subjects for applications such as AR/VR conferencing that mandate negligible system latency. We present Acuity, a real-time system capable of creating high-fidelity representations of human subjects in a virtual scene both visually and aurally. Acuity isolates subjects from high-resolution input point clouds. It reduces the processing overhead by performing background subtraction at a coarse resolution, then applying the detected bounding boxes to fine-grained point clouds. Meanwhile, Acuity leverages an audiovisual sensor fusion approach to expedite sound source separation. The estimated object location in the visual domain guides the acoustic pipeline to isolate the subjects’ voices without running sound source localization. Our results demonstrate that Acuity can isolate multiple subjects’ high-quality point clouds with a maximum latency of 70 ms and average throughput of over 25 fps, while separating audio in less than 30 ms. We provide the source code of Acuity at: https://github.com/nesl/Acuity. 
    more » « less