Improved Speech Enhancement Using a Time-Domain GAN with Mask Learning

Lin, Ju; Niu, Sufeng; Wijngaarden, Adriaan J.; McClendon, Jerome L.; Smith, Melissa C.; Wang, Kuang-Ching

doi:10.21437/Interspeech.2020-1946

Citation Details

Improved Speech Enhancement Using a Time-Domain GAN with Mask Learning

Speech enhancement is an essential component in robust automatic speech recognition (ASR) systems. Most speech enhancement methods are nowadays based on neural networks that use feature-mapping or mask-learning. This paper proposes a novel speech enhancement method that integrates time-domain feature mapping and mask learning into a unified framework using a Generative Adversarial Network (GAN). The proposed framework processes the received waveform and decouples speech and noise signals, which are fed into two short-time Fourier transform (STFT) convolution 1-D layers that map the waveforms to spectrograms in the complex domain. These speech and noise spectrograms are then used to compute the speech mask loss. The proposed method is evaluated using the TIMIT data set for seen and unseen signal-to-noise ratio conditions. It is shown that the proposed method outperforms the speech enhancement methods that use Deep Neural Network (DNN) based speech enhancement or a Speech Enhancement Generative Adversarial Network (SEGAN). more »

Award ID(s):: 1725573

PAR ID:: 10203596

Author(s) / Creator(s):: Lin, Ju; Niu, Sufeng; Wijngaarden, Adriaan J.; McClendon, Jerome L.; Smith, Melissa C.; Wang, Kuang-Ching

Date Published:: 2020-10-25

Journal Name:: Proceedings of Interspeech 2020

Page Range / eLocation ID:: 3286 to 3290

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.21437/Interspeech.2020-1946

More Like this