Multi-input Multi-output Complex Spectral Mapping for Speaker Separation

Taherian, Hassan; Pandey, Ashutosh; Wong, Daniel; Xu, Buye; Wang, DeLiang

doi:10.21437/Interspeech.2023-318

Citation Details

Multi-input Multi-output Complex Spectral Mapping for Speaker Separation

Current deep learning based multi-channel speaker sepa- ration methods produce a monaural estimate of speaker sig- nals captured by a reference microphone. This work presents a new multi-channel complex spectral mapping approach that simultaneously estimates the real and imaginary spectrograms of all speakers at all microphones. The proposed multi-input multi-output (MIMO) separation model uses a location-based training (LBT) criterion to resolve the permutation ambiguity in talker-independent speaker separation across microphones. Experimental results show that the proposed MIMO separation model outperforms a multi-input single-output (MISO) speaker separation model with monaural estimates. We also combine the MIMO separation model with a beamformer and a MISO speech enhancement model to further improve separation performance. The proposed approach achieves the state-of-the-art speaker separation on the open LibriCSS dataset. more »

Award ID(s):: 2125074

PAR ID:: 10552807

Author(s) / Creator(s):: Taherian, Hassan; Pandey, Ashutosh; Wong, Daniel; Xu, Buye; Wang, DeLiang

Publisher / Repository:: ISCA

Date Published:: 2023-08-20

Page Range / eLocation ID:: 1070 to 1074

Subject(s) / Keyword(s):: MIMO speaker separation multi-channel complex spectral mapping location-based training

Format(s):: Medium: X

Location:: Dublin, Ireland

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.21437/Interspeech.2023-318

More Like this