Identifying Important Time-Frequency Locations in Continuous Speech Utterances

Kavaki, Hassan Salami; Mandel, Michael I.

doi:10.21437/Interspeech.2020-2637

Citation Details

Identifying Important Time-Frequency Locations in Continuous Speech Utterances

Human listeners use specific cues to recognize speech and recent experiments have shown that certain time-frequency regions of individual utterances are more important to their correct identification than others. A model that could identify such cues or regions from clean speech would facilitate speech recognition and speech enhancement by focusing on those important regions. Thus, in this paper we present a model that can predict the regions of individual utterances that are important to an automatic speech recognition (ASR) “listener” by learning to add as much noise as possible to these utterances while still permitting the ASR to correctly identify them. This work utilizes a continuous speech recognizer to recognize multi-word utterances and builds upon our previous work that performed the same process for an isolated word recognizer. Our experimental results indicate that our model can apply noise to obscure 90.5% of the spectrogram while leaving recognition performance nearly unchanged. more »

Award ID(s):: 1750383

PAR ID:: 10277023

Author(s) / Creator(s):: Kavaki, Hassan Salami; Mandel, Michael I.

Date Published:: 2020-01-01

Journal Name:: Proceedings of Interspeech

Page Range / eLocation ID:: 1639 to 1643

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.21437/Interspeech.2020-2637

More Like this