skip to main content


Title: Exploiting Frequency Response for the Identification of Microphone using Artificial Neural Networks
Microphone identification addresses the challenge of identifying the microphone signature from the recorded signal. An audio recording system (consisting of microphone, A/D converter, codec, etc.) leaves its unique traces in the recorded signal. Microphone system can be modeled as a linear time invariant system. The impulse response of this system is convoluted with the audio signal which is recorded using “the” microphone. This paper makes an attempt to identify "the" microphone from the frequency response of the microphone. To estimate the frequency response of a microphone, we employ sine sweep method which is independent of speech characteristics. Sinusoidal signals of increasing frequencies are generated, and subsequently we record the audio of each frequency. Detailed evaluation of sine sweep method shows that the frequency response of each microphone is stable. A neural network based classifier is trained to identify the microphone from recorded signal. Results show that the proposed method achieves microphone identification having 100% accuracy.  more » « less
Award ID(s):
1815724
PAR ID:
10097313
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
AES Int. Conf. Audio Forensics 2019
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The microphone systems employed by smart devices such as cellphones and tablets require case penetrations that leave them vulnerable to environmental damage. A structural sensor mounted on the back of the display screen can be employed to record audio by capturing the bending vibration signals induced in the display panel by an incident acoustic wave - enabling a functional microphone on a fully sealed device. Distributed piezoelectric sensing elements and low-noise accelerometers were bonded to the surfaces of several different panels and used to record acoustic speech signals. The quality of the recorded signals was assessed using the speech transmission index, and the recordings were transcribed to text using an automatic speech recognition system. Although the quality of the speech signals recorded by the piezoelectric sensors was reduced compared to the quality of speech recorded by the accelerometers, the word-error-rate of each transcription increased only by approximately 2% on average, suggesting that distributed piezoelectric sensors can be used as a low-cost surface microphone for smart devices that employ automatic speech recognition. A method of crosstalk cancellation was also implemented to enable the simultaneous recording and playback of audio signals by an array of piezoelectric elements and evaluated by the measured improvement in the recording’s signal-to-interference ratio. 
    more » « less
  2. Fast-frequency control strategies have been proposed in the literature to maintain inertial response of electric generation and help with the frequency regulation of the system. However, it is challenging to deploy such strategies when the inertia constant of the system is unknown and time-varying. In this paper, we present a data-driven system identification approach for an energy storage system (ESS) operator to identify the inertial response of the system (and consequently the inertia constant). The method is first tested and validated with a simulated genset model using small changes in the system load as the excitation signal and measuring the corresponding change in frequency. The validated method is then used to experimentally identify the inertia constant of a genset. The inertia constant of the simulated genset model was estimated with an error of less than 5% which provides a reasonable estimate for the ESS operator to properly tune the parameters of a fast-frequency controller. 
    more » « less
  3. Abstract

    Integrated quadrant analysis is a novel technique to identify and to characterize the trajectory and strength of turbulent coherent structures in the atmospheric surface layer. By integrating the three-dimensional velocity field characterized by traditional quadrant analysis with respect to time, the trajectory history of individual coherent structures can be preserved with Eulerian turbulence measurements. We develop a method to identify the ejection phase of coherent structures based on turbulence kinetic energy (TKE). Identifying coherent structures within a time series using TKE performs better than identifying them with the streamwise and vertical velocity components because some coherent structures are dominated by the cross-stream velocity component as they pass the sensor. By combining this identification method with the integrated quadrant analysis, one can animate or plot the trajectory of individual coherent structures from high-frequency velocity measurements. This procedure links a coherent ejection with the subsequent sweep and quiescent period in time to visualize and quantify the strength and the duration of a coherent structure. We develop and verify the method of integrated quadrant analysis with data from two field studies: the Eclipse Boundary Layer Experiment (EBLE) in Corvallis, Oregon in August 2017 (grass field) and the Vertical Cherry Array Experiment (VACE) in Linden, California in November 2019 (cherry orchard). The combined TKE identification method and integrated quadrant analysis are promising additions to conditional sampling techniques and coherent structure characterization because the identify coherent structures and couple the sweep and ejection components in space. In an orchard (VACE), integrated quadrant analysis verifies each coherent structure is dominated by a sweep. Conversely, above the roughness sublayer (EBLE), each coherent structure is dominated by an ejection.

     
    more » « less
  4. Collaboration is a 21st Century skill as well as an effective method for learning, so detection of collaboration is important for both assessment and instruction. Speech-based collaboration detection can be quite accurate but collecting the speech of students in classrooms can raise privacy issues. An alternative is to send only whether or not the student is speaking. That is, the speech signal is processed at the microphone by a voice activity detector before being transmitted to the collaboration detector. Because the transmitted signal is binary (1 = speaking, 0 = silence), this method mitigates privacy issues. However, it may harm the accuracy of collaboration detection. To find out how much harm is done, this study compared the relative effectiveness of collaboration detectors based either on the binary signal or high-quality audio. Pairs of students were asked to work together on solving complex math problems. Three qualitative levels of interactivity was distinguished: Interaction, Cooperation and Other. Human coders used richer data (several audio and video streams) to choose the code for each episode. Machine learning was used to induce a detector to assign a code for every episode based on the features. The binary-based collaboration detectors delivered only slightly less accuracy than collaboration detectors based on the high quality audio signal. 
    more » « less
  5. We introduce a deep learning model for speech denoising, a long-standing challenge in audio analysis arising in numerous applications. Our approach is based on a key observation about human speech: there is often a short pause between each sentence or word. In a recorded speech signal, those pauses introduce a series of time periods during which only noise is present. We leverage these incidental silent intervals to learn a model for automatic speech denoising given only mono-channel audio. Detected silent intervals over time expose not just pure noise but its time-varying features, allowing the model to learn noise dynamics and suppress it from the speech signal. Experiments on multiple datasets confirm the pivotal role of silent interval detection for speech denoising, and our method outperforms several state-of-the-art denoising methods, including those that accept only audio input (like ours) and those that denoise based on audiovisual input (and hence require more information). We also show that our method enjoys excellent generalization properties, such as denoising spoken languages not seen during training. 
    more » « less