A brain-inspired algorithm enhances automatic speech recognition performance in multi-talker scenes

Boyd, Alexander D; Sen, Kamal

doi:10.1101/2025.07.15.664627

Citation Details

This content will become publicly available on July 16, 2026

A brain-inspired algorithm enhances automatic speech recognition performance in multi-talker scenes

Abstract Modern automatic speech recognition (ASR) systems are capable of impressive performance recognizing clean speech but struggle in noisy, multi-talker environments, commonly referred to as the “cocktail party problem.” In contrast, many human listeners can solve this problem, suggesting the existence of a solution in the brain. Here we present a novel approach that uses a brain inspired sound segregation algorithm (BOSSA) as a preprocessing step for a state-of-the-art ASR system (Whisper). We evaluated BOSSA’s impact on ASR accuracy in a spatialized multi-talker scene with one target speaker and two competing maskers, varying the difficulty of the task by changing the target-to-masker ratio. We found that median word error rate improved by up to 54% when the target-to-masker ratio was low. Our results indicate that brain-inspired algorithms have the potential to considerably enhance ASR accuracy in challenging multi-talker scenarios without the need for retraining or fine-tuning existing state-of-the-art ASR systems. more »

Award ID(s):: 2319321

PAR ID:: 10617065

Author(s) / Creator(s):: Boyd, Alexander D; Sen, Kamal

Publisher / Repository:: bioRxiv

Date Published:: 2025-07-16

Format(s):: Medium: X

Institution:: bioRxiv

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on July 16, 2026
Posted Content:
https://doi.org/10.1101/2025.07.15.664627

More Like this