Leveraging Sound Localization to Improve Continuous Speaker Separation

Taherian, Hassan; Pandey, Ashutosh; Wong, Daniel; Xu, Buye; Wang, DeLiang

doi:10.1109/ICASSP48485.2024.10446934

Citation Details

Leveraging Sound Localization to Improve Continuous Speaker Separation

Continuous speaker separation aims to separate overlapping speakers in real-world environments like meetings, but it often falls short in isolating speech segments of a single speaker. This leads to split signals that adversely affect downstream applications such as automatic speech recognition and speaker diarization. Existing solutions like speaker counting have limitations. This paper presents a novel multi-channel approach for continuous speaker separation based on multi-input multi-output (MIMO) complex spectral mapping. This MIMO approach enables robust speaker localization by preserving inter-channel phase relations. Speaker localization as a byproduct of the MIMO separation model is then used to identify single-talker frames and reduce speaker splitting. We demonstrate that this approach achieves superior frame-level sound localization. Systematic experiments on the LibriCSS dataset further show that the proposed approach outperforms other methods, advancing state-of-the-art speaker separation performance. more »

Award ID(s):: 2125074

PAR ID:: 10552806

Author(s) / Creator(s):: Taherian, Hassan; Pandey, Ashutosh; Wong, Daniel; Xu, Buye; Wang, DeLiang

Publisher / Repository:: IEEE

Date Published:: 2024-04-14

ISBN:: 979-8-3503-4485-1

Page Range / eLocation ID:: 621 to 625

Subject(s) / Keyword(s):: MIMO complex spectral mapping continuous speaker separation robust speaker localization

Format(s):: Medium: X

Location:: Seoul, Korea, Republic of

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ICASSP48485.2024.10446934

More Like this