skip to main content


This content will become publicly available on May 13, 2024

Title: Audio Capture Using Piezoelectric Sensors on Vibrating Panel Surfaces
The microphone systems employed by smart devices such as cellphones and tablets require case penetrations that leave them vulnerable to environmental damage. A structural sensor mounted on the back of the display screen can be employed to record audio by capturing the bending vibration signals induced in the display panel by an incident acoustic wave - enabling a functional microphone on a fully sealed device. Distributed piezoelectric sensing elements and low-noise accelerometers were bonded to the surfaces of several different panels and used to record acoustic speech signals. The quality of the recorded signals was assessed using the speech transmission index, and the recordings were transcribed to text using an automatic speech recognition system. Although the quality of the speech signals recorded by the piezoelectric sensors was reduced compared to the quality of speech recorded by the accelerometers, the word-error-rate of each transcription increased only by approximately 2% on average, suggesting that distributed piezoelectric sensors can be used as a low-cost surface microphone for smart devices that employ automatic speech recognition. A method of crosstalk cancellation was also implemented to enable the simultaneous recording and playback of audio signals by an array of piezoelectric elements and evaluated by the measured improvement in the recording’s signal-to-interference ratio.  more » « less
Award ID(s):
2104758
NSF-PAR ID:
10413999
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
154th Convention of the Audio Engineering Society
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The direction of arrival (DOA) of an acoustic source is a signal characteristic used by smart audio devices to enable signal enhancement algorithms. Though DOA estimations are traditionally made using a multi-microphone array, we propose that the resonant modes of a surface excited by acoustic waves contain sufficient spatial information that DOA may be estimated using a singular structural vibration sensor. In this work, sensors are affixed to an acrylic panel and used to record acoustic noise signals at various angles of incidence. From these recordings, feature vectors containing the sums of the energies in the panel’s isolated modal regions are extracted and used to train deep neural networks to estimate DOA. Experimental results show that when all 13 of the acrylic panel’s isolated modal bands are utilized, the DOA of incident acoustic waves for a broadband noise signal may be estimated by a single structural sensor to within ±5° with a reliability of 98.4%. The size of the feature set may be reduced by eliminating the resonant modes that do not have strong spatial coupling to the incident acoustic wave. Reducing the feature set to the 7 modal bands that provide the most spatial information produces a reliability of 89.7% for DOA estimates within ±5° using a single sensor. 
    more » « less
  2. Despite the advent of numerous Internet-of-Things (IoT) applications, recent research demonstrates potential side-channel vulnerabilities exploiting sensors which are used for event and environment monitoring. In this paper, we propose a new side-channel attack, where a network of distributed non-acoustic sensors can be exploited by an attacker to launch an eavesdropping attack by reconstructing intelligible speech signals. Specifically, we present PitchIn to demonstrate the feasibility of speech reconstruction from non-acoustic sensor data collected offline across networked devices. Unlike speech reconstruction which requires a high sampling frequency (e.g., > 5 KHz), typical applications using non-acoustic sensors do not rely on richly sampled data, presenting a challenge to the speech reconstruction attack. Hence, PitchIn leverages a distributed form of Time Interleaved Analog-Digital-Conversion (TIADC) to approximate a high sampling frequency, while maintaining low per-node sampling frequency. We demonstrate how distributed TI-ADC can be used to achieve intelligibility by processing an interleaved signal composed of different sensors across networked devices. We implement PitchIn and evaluate reconstructed speech signal intelligibility via user studies. PitchIn has word recognition accuracy as high as 79%. Though some additional work is required to improve accuracy, our results suggest that eavesdropping using a fusion of non-acoustic sensors is a real and practical threat. 
    more » « less
  3. Microphone identification addresses the challenge of identifying the microphone signature from the recorded signal. An audio recording system (consisting of microphone, A/D converter, codec, etc.) leaves its unique traces in the recorded signal. Microphone system can be modeled as a linear time invariant system. The impulse response of this system is convoluted with the audio signal which is recorded using “the” microphone. This paper makes an attempt to identify "the" microphone from the frequency response of the microphone. To estimate the frequency response of a microphone, we employ sine sweep method which is independent of speech characteristics. Sinusoidal signals of increasing frequencies are generated, and subsequently we record the audio of each frequency. Detailed evaluation of sine sweep method shows that the frequency response of each microphone is stable. A neural network based classifier is trained to identify the microphone from recorded signal. Results show that the proposed method achieves microphone identification having 100% accuracy. 
    more » « less
  4. Smart speaker voice assistants (VAs) such as Amazon Echo and Google Home have been widely adopted due to their seamless integration with smart home devices and the Internet of Things (IoT) technologies. These VA services raise privacy concerns, especially due to their access to our speech. This work considers one such use case: the unaccountable and unauthorized surveillance of a user's emotion via speech emotion recognition (SER). This paper presents DARE-GP, a solution that creates additive noise to mask users' emotional information while preserving the transcription-relevant portions of their speech. DARE-GP does this by using a constrained genetic programming approach to learn the spectral frequency traits that depict target users' emotional content, and then generating a universal adversarial audio perturbation that provides this privacy protection. Unlike existing works, DARE-GP provides: a) real-time protection of previously unheard utterances, b) against previously unseen black-box SER classifiers, c) while protecting speech transcription, and d) does so in a realistic, acoustic environment. Further, this evasion is robust against defenses employed by a knowledgeable adversary. The evaluations in this work culminate with acoustic evaluations against two off-the-shelf commercial smart speakers using a small-form-factor (raspberry pi) integrated with a wake-word system to evaluate the efficacy of its real-world, real-time deployment.

     
    more » « less
  5. This work presents a prototype of a wireless, flexible, self-powered sensor used to analyze head impact kinematics relevant to concussions, which are frequent in high contact sports. Two untethered, paper-thin, and flexible sensing devices with piezoelectric-like behavior are placed around the neck of a human head substitute and used to monitor stress/strain in this region during an impact. The mechanical energy exerted by an impact force –varied in locations and magnitudes– is converted to pulses of electric energy which are transmitted wirelessly to a smart device for storage and analysis. The wireless prototype system is presented using a microcontroller with an integrated Bluetooth Low Energy module. The static and dynamic characteristics of the transmitted signal are then compared to signals from accelerometers embedded in a head substitute, to map the sensor’s output to the angular velocity and acceleration during impacts. It is demonstrated that using only two sensors is enough to detect impacts coming from any direction; and that placing multiple external sensors around the neck region could provide accurate information on the dynamics of the head, during a collision, which other sensors fail to capture. 
    more » « less