Fine-Grained Textual Knowledge Transfer to Improve RNN Transducers for Speech Recognition and Understanding

Sunder, Vishal; Thomas, Samuel; Kuo, Hong-Kwang J.; Kingsbury, Brian; Fosler-Lussier, Eric

doi:10.1109/ICASSP49357.2023.10094997

Citation Details

Fine-Grained Textual Knowledge Transfer to Improve RNN Transducers for Speech Recognition and Understanding

RNN Tranducer (RNN-T) technology is very popular for building deployable models for end-to-end (E2E) automatic speech recognition (ASR) and spoken language understanding (SLU). Since these are E2E models operating on speech directly, there remains a potential to improve their performance using purely text based models like BERT, which have strong language understanding capabilities. In this paper, we propose a new training criteria for RNN-T based E2E ASR and SLU to transfer BERT’s knowledge into these systems. In the first stage of our proposed mechanism, we improve ASR performance by using a fine-grained, tokenwise knowledge transfer from BERT. In the second stage, we fine-tune the ASR model for SLU such that the above knowledge is explicitly utilized by the RNN-T model for improved performance. Our techniques improve ASR performance on the Switchboard and CallHome test sets of the NIST Hub5 2000 evaluation and on the recently released SLURP dataset on which we achieve a new state-of-the-art performance. For SLU, we show significant improvements on the SLURP slot filling task, outperforming HuBERT-base and reaching a performance close to HuBERTlarge. Compared to large transformer based speech models like HuBERT, our model is significantly more compact and uses only 300 hours of speech pretraining data. more »

Award ID(s):: 2008043

PAR ID:: 10439493

Author(s) / Creator(s):: Sunder, Vishal; Thomas, Samuel; Kuo, Hong-Kwang J.; Kingsbury, Brian; Fosler-Lussier, Eric

Date Published:: 2023-06-04

Journal Name:: International Conference on Acoustics, Speech and Signal Processing

Page Range / eLocation ID:: 1 to 5

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/ICASSP49357.2023.10094997

More Like this