skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Adaptive Crosstalk Cancellation and Spatialization for Dynamic Group Conversation Enhancement Using Mobile and Wearable Devices
We propose a system to improve the intelligibility of group conversations in noisy environments, such as restaurants, by aggregating signals from the mobile and wearable devices of the participants. The proposed system uses a mobile device placed near each talker to capture a low-noise speech signal. Instead of muting inactive microphones, which can be distracting, adaptive crosstalk cancellation filters remove the speech of other users, including delayed auditory feedback of the listener’s own speech. Next, adaptive spatialization filters process the low-noise signals to generate binaural outputs that match the spatial and spectral cues at the ears of each listener. The proposed system is demonstrated using recordings of three human subjects conversing with realistic movement.  more » « less
Award ID(s):
1919257
PAR ID:
10475917
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
2022 International Workshop on Acoustic Signal Enhancement (IWAENC)
ISBN:
978-1-6654-6867-1
Page Range / eLocation ID:
1 to 5
Format(s):
Medium: X
Location:
Bamberg, Germany
Sponsoring Org:
National Science Foundation
More Like this
  1. We consider the problem of separating speech from several talkers in background noise using a fixed microphone array and a set of wearable devices. Wearable devices can provide reliable information about speech from their wearers, but they typically cannot be used directly for multichannel source separation due to network delay, sample rate offsets, and relative motion. Instead, the wearable microphone signals are used to compute the speech presence probability for each talker at each time-frequency index. Those parameters, which are robust against small sample rate offsets and relative motion, are used to track the second-order statistics of the speech sources and background noise. The fixed array then separates the speech signals using an adaptive linear time-varying multichannel Wiener filter. The proposed method is demonstrated using real-room recordings from three human talkers with binaural earbud microphones and an eight-microphone tabletop array. 
    more » « less
  2. We propose a new adaptive feedback cancellation (AFC) system in hearing aids (HAs) based on a well-posed optimization criterion that jointly considers both decorrelation of the signals and sparsity of the underlying channel. We show that the least squares criterion on subband errors regularized by a p-norm-like diversity measure can be used to simultaneously decorrelate the speech signals and exploit sparsity of the acoustic feedback path impulse response. Compared with traditional subband adaptive filters that are not appropriate for incorporating sparsity due to shorter sub-filters, our proposed framework is suitable for promoting sparse characteristics, as the update rule utilizing subband information actually operates in the fullband. Simulation results show that the normalized misalignment, added stable gain, and other objective metrics of the AFC are significantly improved by choosing a proper sparsity promoting factor and a suitable number of subbands. More importantly, the results indicate that the benefits of subband decomposition and sparsity promoting are complementary and additive for AFC in HAs. 
    more » « less
  3. null (Ed.)
    Smartphones and mobile applications have become an integral part of our daily lives. This is reflected by the increase in mobile devices, applications, and revenue generated each year. However, this growth is being met with an increasing concern for user privacy, and there have been many incidents of privacy and data breaches related to smartphones and mobile applications in recent years. In this work, we focus on improving privacy for audio-based mobile systems. These applications will generally listen to all sounds in the environment and may record privacy-sensitive signals, such as speech, that may not be needed for the application. We present PAMS, a software development package for mobile applications. PAMS integrates a novel sound source filtering algorithm called Probabilistic Template Matching to generate a set of privacy-enhancing filters that remove extraneous sounds using learned statistical "templates" of these sounds. We demonstrate the effectiveness of PAMS by integrating it into a sleep monitoring system, with the intent to remove extraneous speech from breathing, snoring, and other sleep sounds that the system is monitoring. By comparing our PAMS enhanced sleep monitoring system with existing mobile systems, we show that PAMS can reduce speech intelligibility by up to 74.3% while maintaining similar performance in detecting sleeping sounds. 
    more » « less
  4. The microphone systems employed by smart devices such as cellphones and tablets require case penetrations that leave them vulnerable to environmental damage. A structural sensor mounted on the back of the display screen can be employed to record audio by capturing the bending vibration signals induced in the display panel by an incident acoustic wave - enabling a functional microphone on a fully sealed device. Distributed piezoelectric sensing elements and low-noise accelerometers were bonded to the surfaces of several different panels and used to record acoustic speech signals. The quality of the recorded signals was assessed using the speech transmission index, and the recordings were transcribed to text using an automatic speech recognition system. Although the quality of the speech signals recorded by the piezoelectric sensors was reduced compared to the quality of speech recorded by the accelerometers, the word-error-rate of each transcription increased only by approximately 2% on average, suggesting that distributed piezoelectric sensors can be used as a low-cost surface microphone for smart devices that employ automatic speech recognition. A method of crosstalk cancellation was also implemented to enable the simultaneous recording and playback of audio signals by an array of piezoelectric elements and evaluated by the measured improvement in the recording’s signal-to-interference ratio. 
    more » « less
  5. Speech quality is one of the main foci of speech-related research, where it is frequently studied with speech intelligibility, another essential measurement. Band-level perceptual speech intelligibility, however, has been studied frequently, whereas speech quality has not been thoroughly analyzed. In this paper, a Multiple Stimuli With Hidden Reference and Anchor (MUSHRA) inspired approach was proposed to study the individual robustness of frequency bands to noise with perceptual speech quality as the measure. Speech signals were filtered into thirty-two frequency bands with compromising real-world noise employed at different signal-to-noise ratios. Robustness to noise indices of individual frequency bands was calculated based on the human-rated perceptual quality scores assigned to the reconstructed noisy speech signals. Trends in the results suggest the mid-frequency region appeared less robust to noise in terms of perceptual speech quality. These findings suggest future research aiming at improving speech quality should pay more attention to the mid-frequency region of the speech signals accordingly. 
    more » « less