Multi-Resolution Location-Based Training for Multi-Channel Continuous Speech Separation

Hassan Taherian; DeLiang Wang

doi:10.1109/ICASSP49357.2023.10096684

Citation Details

Multi-Resolution Location-Based Training for Multi-Channel Continuous Speech Separation

The performance of automatic speech recognition (ASR) systems severely degrades when multi-talker speech overlap occurs. In meeting environments, speech separation is typically performed to improve the robustness of ASR systems. Recently, location-based training (LBT) was proposed as a new training criterion for multi-channel talker-independent speaker separation. Assuming fixed array geometry, LBT outperforms widely-used permutation-invariant training in fully overlapped utterances and matched reverberant conditions. This paper extends LBT to conversational multi-channel speaker separation. We introduce multi-resolution LBT to estimate the complex spectrograms from low to high time and frequency resolutions. With multi-resolution LBT, convolutional kernels are assigned consistently based on speaker locations in physical space. Evaluation results show that multi-resolution LBT consistently outperforms other competitive methods on the recorded LibriCSS corpus. more »

Award ID(s):: 2125074

PAR ID:: 10439577

Author(s) / Creator(s):: Hassan Taherian; DeLiang Wang

Date Published:: 2023-06-04

Journal Name:: Proceedings of the IEEE International Conference on Acoustics Speech and Signal Processing

ISSN:: 2379-190X

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ICASSP49357.2023.10096684

More Like this