skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Toward audio-based sensing for pedestrian detection
The detection and counting of pedestrians plays a central role for the design of smart cities. Although the use of cameras for this task has been shown to have high accuracy, they come at a high cost and are susceptible to challenges such as poor lighting, fog, and obstructed views. Our study investigates audio-based pedestrian detection, combining potentially low cost sensors with advanced machine learning based audio analysis algorithms. With an audio sensor installed along the walkway, machine learning algorithms can tell from the audio whether there is a pedestrian or not, or how far the pedestrian is from the sensor. Results show the general feasibility of audio-based pedestrian detection but fall short of reaching the accuracy levels of video-based detection.  more » « less
Award ID(s):
2203408
PAR ID:
10580032
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
JASA
Date Published:
Journal Name:
The Journal of the Acoustical Society of America
Volume:
155
Issue:
3_Supplement
ISSN:
0001-4966
Page Range / eLocation ID:
A282 to A282
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The global COVID-19 pandemic has strained healthcare systems and highlighted the need for accessible and efficient diagnostic methods. Traditional diagnostic tools, such as nasal swabs and biosensors, while accurate, pose significant logistical challenges and high costs, limiting their scalability. This paper explores an alternative, non-invasive approach to COVID-19 detection using machine learning algorithms to analyze vocal patterns, particularly cough and breathing sounds. Leveraging a publicly available dataset, we developed machine learning models capable of classifying audio samples as COVID-19 positive or negative. Our models achieve an AUC of up to 85% and an F1- score of 81%, demonstrating the potential of machine learning in enabling rapid, cost-effective COVID-19 diagnosis. These findings suggest that audio-based diagnostics could be a practical and scalable solution, particularly in resource-limited settings where traditional methods are less feasible. 
    more » « less
  2. Impostors are attackers who take over a smartphone and gain access to the legitimate user’s confidential and private information. This paper proposes a defense-in-depth mechanism to detect impostors quickly with simple Deep Learning algorithms, which can achieve better detection accuracy than the best prior work which used Machine Learning algorithms requiring computation of multiple features. Different from previous work, we then consider protecting the privacy of a user’s behavioral (sensor) data by not exposing it outside the smartphone. For this scenario, we propose a Recurrent Neural Network (RNN) based Deep Learning algorithm that uses only the legitimate user’s sensor data to learn his/her normal behavior. We propose to use Prediction Error Distribution (PED) to enhance the detection accuracy. We also show how a minimalist hardware module, dubbed SID for Smartphone Impostor Detector, can be designed and integrated into smartphones for self-contained impostor detection. Experimental results show that SID can support real-time impostor detection, at a very low hardware cost and energy consumption, compared to other RNN accelerators. 
    more » « less
  3. Vehicle-to-pedestrian communication could significantly improve pedestrian safety at signalized intersections. However, it is unlikely that pedestrians will typically be carrying a low latency communication-enabled device with an activated pedestrian safety application in their hand-held device all the time. Because of this, multiple traffic cameras at a signalized intersection could be used to accurately detect and locate pedestrians using deep learning, and broadcast safety alerts related to pedestrians to warn connected and automated vehicles around signalized intersections. However, the unavailability of high-performance roadside computing infrastructure and the limited network bandwidth between traffic cameras and the computing infrastructure limits the ability of real-time data streaming and processing for pedestrian detection. In this paper, we describe an edge computing-based real-time pedestrian detection strategy that combines a pedestrian detection algorithm using deep learning and an efficient data communication approach to reduce bandwidth requirements while maintaining high pedestrian detection accuracy. We utilize a lossy compression technique on traffic camera data to determine the tradeoff between the reduction of the communication bandwidth requirements and a defined pedestrian detection accuracy. The performance of the pedestrian detection strategy is measured in relation to pedestrian classification accuracy with varying peak signal-to-noise ratios. The analyses reveal that we detect pedestrians by maintaining a defined detection accuracy with a peak signal-to-noise ratio 43 dB while reducing the communication bandwidth from 9.82 Mbits/sec to 0.31 Mbits/sec, a 31× reduction. 
    more » « less
  4. Collaboration is a 21st Century skill as well as an effective method for learning, so detection of collaboration is important for both assessment and instruction. Speech-based collaboration detection can be quite accurate but collecting the speech of students in classrooms can raise privacy issues. An alternative is to send only whether or not the student is speaking. That is, the speech signal is processed at the microphone by a voice activity detector before being transmitted to the collaboration detector. Because the transmitted signal is binary (1 = speaking, 0 = silence), this method mitigates privacy issues. However, it may harm the accuracy of collaboration detection. To find out how much harm is done, this study compared the relative effectiveness of collaboration detectors based either on the binary signal or high-quality audio. Pairs of students were asked to work together on solving complex math problems. Three qualitative levels of interactivity was distinguished: Interaction, Cooperation and Other. Human coders used richer data (several audio and video streams) to choose the code for each episode. Machine learning was used to induce a detector to assign a code for every episode based on the features. The binary-based collaboration detectors delivered only slightly less accuracy than collaboration detectors based on the high quality audio signal. 
    more » « less
  5. As the real-world applications (image segmentation, speech recognition, machine translation, etc.) are increasingly adopting Deep Neural Networks (DNNs), DNN's vulnerabilities in a malicious environment have become an increasingly important research topic in adversarial machine learning. Adversarial machine learning (AML) focuses on exploring vulnerabilities and defensive techniques for machine learning models. Recent work has shown that most adversarial audio generation methods fail to consider audios' temporal dependency (TD) (i.e., adversarial audios exhibit weaker TD than benign audios). As a result, the adversarial audios are easily detectable by examining their TD. Therefore, one area of interest in the audio AML community is to develop a novel attack that evades a TD-based detection model. In this contribution, we revisit the LSTM model for audio transcription and propose a new audio attack algorithm that evades the TD-based detection by explicitly controlling the TD in generated adversarial audios. The experimental results show that the detectability of our adversarial audio is significantly reduced compared to the state-of-the-art audio attack algorithms. Furthermore, experiments also show that our adversarial audios remain nearly indistinguishable from benign audios with only negligible perturbation magnitude. 
    more » « less