Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech

Ghorbani, Shahram; Hansen, John H.L.

doi:10.1109/TASLP.2022.3233238

Citation Details

Domain Expansion for End-to-End Speech Recognition: Applications for Accent/Dialect Speech

Training Automatic Speech Recognition (ASR) systems with sequentially incoming data from alternate domains is an essential milestone in order to reach human intelligibility level in speech recognition. The main challenge of sequential learning is that current adaptation techniques result in significant performance degradation for previously-seen domains.To mitigate the catastrophic forgetting problem, this study proposes effective domain expansion techniques for two scenarios: 1)where only new domain data is available, and 2) where both prior and new domain data are available. We examine the efficacy of the approaches through experiments on adapting a model trained with native English to different English accents. For the first scenario, we study several existing and proposed regularization-based approaches to mitigate performance loss of initial data.The experiments demonstrate the superior performanceo four proposed Soft KL-Divergence(SKLD)-Model Averaging (MA) approach. In this approach, SKLD first alleviates the forgetting problem during adaptation; next, MA makes the final efficient compromise between the two domains by averaging parameters of the initial and adapted models. For the second scenario, we explore several rehearsal-based approaches, which leverage initial data to maintain the original model performance.We propose Gradient Averaging (GA) as well as an approach which operates by averaging gradients computed for both initial and new domains. Experiments demonstrate that GA outperforms retraining and specifically designed continual learning approaches, such as Averaged Gradient Episodic Memory (AGEM). Moreover, GA significantly improves computational costs over the complete retraining approach. more »

Award ID(s):: 1918032

PAR ID:: 10478761

Author(s) / Creator(s):: Ghorbani, Shahram; Hansen, John H.L.

Publisher / Repository:: IEEE

Date Published:: 2023-01-01

Journal Name:: IEEE/ACM Transactions on Audio, Speech, and Language Processing

Volume:: 31

ISSN:: 2329-9290

Page Range / eLocation ID:: 762 - 774

Subject(s) / Keyword(s):: Accented speech continuallearning domain expansion end-to-end systems model adaptation speech recognition

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1109/TASLP.2022.3233238

More Like this