skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Cooperative Speech Separation With a Microphone Array and Asynchronous Wearable Devices
We consider the problem of separating speech from several talkers in background noise using a fixed microphone array and a set of wearable devices. Wearable devices can provide reliable information about speech from their wearers, but they typically cannot be used directly for multichannel source separation due to network delay, sample rate offsets, and relative motion. Instead, the wearable microphone signals are used to compute the speech presence probability for each talker at each time-frequency index. Those parameters, which are robust against small sample rate offsets and relative motion, are used to track the second-order statistics of the speech sources and background noise. The fixed array then separates the speech signals using an adaptive linear time-varying multichannel Wiener filter. The proposed method is demonstrated using real-room recordings from three human talkers with binaural earbud microphones and an eight-microphone tabletop array.  more » « less
Award ID(s):
1919257
PAR ID:
10475918
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ISCA
Date Published:
Journal Name:
Proc. Interspeech 2022
Page Range / eLocation ID:
5398 to 5402
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The microphone systems employed by smart devices such as cellphones and tablets require case penetrations that leave them vulnerable to environmental damage. A structural sensor mounted on the back of the display screen can be employed to record audio by capturing the bending vibration signals induced in the display panel by an incident acoustic wave - enabling a functional microphone on a fully sealed device. Distributed piezoelectric sensing elements and low-noise accelerometers were bonded to the surfaces of several different panels and used to record acoustic speech signals. The quality of the recorded signals was assessed using the speech transmission index, and the recordings were transcribed to text using an automatic speech recognition system. Although the quality of the speech signals recorded by the piezoelectric sensors was reduced compared to the quality of speech recorded by the accelerometers, the word-error-rate of each transcription increased only by approximately 2% on average, suggesting that distributed piezoelectric sensors can be used as a low-cost surface microphone for smart devices that employ automatic speech recognition. A method of crosstalk cancellation was also implemented to enable the simultaneous recording and playback of audio signals by an array of piezoelectric elements and evaluated by the measured improvement in the recording’s signal-to-interference ratio. 
    more » « less
  2. We propose a system to improve the intelligibility of group conversations in noisy environments, such as restaurants, by aggregating signals from the mobile and wearable devices of the participants. The proposed system uses a mobile device placed near each talker to capture a low-noise speech signal. Instead of muting inactive microphones, which can be distracting, adaptive crosstalk cancellation filters remove the speech of other users, including delayed auditory feedback of the listener’s own speech. Next, adaptive spatialization filters process the low-noise signals to generate binaural outputs that match the spatial and spectral cues at the ears of each listener. The proposed system is demonstrated using recordings of three human subjects conversing with realistic movement. 
    more » « less
  3. In this paper, we present Jawthenticate, an earable system that authenticates a user using audible or inaudible speech without us- ing a microphone. This system can overcome the shortcomings of traditional voice-based authentication systems like unreliability in noisy conditions and spoofing using microphone-based replay attacks. Jawthenticate derives distinctive speech-related features from the jaw motion and associated facial vibrations. This combi- nation of features makes Jawthenticate resilient to vocal imitations as well as camera-based spoofing. We use these features to train a two-class SVM classifier for each user. Our system is invariant to the content and language of speech. In a study conducted with 41 subjects, who speak different native languages, Jawthenticate achieves a Balanced Accuracy (BAC) of 97.07%, True Positive Rate (TPR) of 97.75%, and True Negative Rate (TNR) of 96.4% with just 3 seconds of speech data. 
    more » « less
  4. A photoplethysmography (PPG) is an uncomplicated and inexpensive optical technique widely used in the healthcare domain to extract valuable health-related information, e.g., heart rate variability, blood pressure, and respiration rate. PPG signals can easily be collected continuously and remotely using portable wearable devices. However, these measuring devices are vulnerable to motion artifacts caused by daily life activities. The most common ways to eliminate motion artifacts use extra accelerometer sensors, which suffer from two limitations: i) high power consumption and ii) the need to integrate an accelerometer sensor in a wearable device (which is not required in certain wearables). This paper proposes a low-power non-accelerometer-based PPG motion artifacts removal method outperforming the accuracy of the existing methods. We use Cycle Generative Adversarial Network to reconstruct clean PPG signals from noisy PPG signals. Our novel machine-learning-based technique achieves 9.5 times improvement in motion artifact removal compared to the state-of-the-art without using extra sensors such as an accelerometer, which leads to 45% improvement in energy efficiency. 
    more » « less
  5. Microphone identification addresses the challenge of identifying the microphone signature from the recorded signal. An audio recording system (consisting of microphone, A/D converter, codec, etc.) leaves its unique traces in the recorded signal. Microphone system can be modeled as a linear time invariant system. The impulse response of this system is convoluted with the audio signal which is recorded using “the” microphone. This paper makes an attempt to identify "the" microphone from the frequency response of the microphone. To estimate the frequency response of a microphone, we employ sine sweep method which is independent of speech characteristics. Sinusoidal signals of increasing frequencies are generated, and subsequently we record the audio of each frequency. Detailed evaluation of sine sweep method shows that the frequency response of each microphone is stable. A neural network based classifier is trained to identify the microphone from recorded signal. Results show that the proposed method achieves microphone identification having 100% accuracy. 
    more » « less