Filterbank Learning for Noise-Robust Small-Footprint Keyword Spotting

López-Espejo, Iván; Shekar, Ram C.; Tan, Zheng-Hua; Jensen, Jesper; Hansen, John H.

doi:10.1109/ICASSP49357.2023.10095436

Citation Details

Filterbank Learning for Noise-Robust Small-Footprint Keyword Spotting

In the context of keyword spotting (KWS), the replacement of handcrafted speech features by learnable features has not yielded superior KWS performance. In this study, we demonstrate that filterbank learning outperforms handcrafted speech features for KWS whenever the number of filterbank channels is severely decreased. Reducing the number of channels might yield certain KWS performance drop, but also a substantial energy consumption reduction, which is key when deploying common always-on KWS on low-resource devices. Experimental results on a noisy version of the Google Speech Commands Dataset show that filterbank learning adapts to noise characteristics to provide a higher degree of robustness to noise, especially when dropout is integrated. Thus, switching from typically used 40-channel log-Mel features to 8-channel learned features leads to a relative KWS accuracy loss of only 3.5% while simultaneously achieving a 6.3× energy consumption reduction. more »

Award ID(s):: 2016725

PAR ID:: 10484456

Author(s) / Creator(s):: López-Espejo, Iván; Shekar, Ram C.; Tan, Zheng-Hua; Jensen, Jesper; Hansen, John H.

Publisher / Repository:: IEEE

Date Published:: 2023-06-04

Journal Name:: IEEE ICASSP-2023: Inter. Conf. Audio, Speech, and Signal Processing

Edition / Version:: Paper #1986

ISBN:: 978-1-7281-6327-7

Page Range / eLocation ID:: 1 to 5

Subject(s) / Keyword(s):: Keyword spotting filterbank learning small footprint noise robustness end-to-end

Format(s):: Medium: X Size: 1MB

Size(s):: 1MB

Location:: Rhodes Island, Greece

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ICASSP49357.2023.10095436

More Like this