CorrGAN: Simultaneous Learning of Speech Enhancement and Perceptual Quality Loss Functions

Zadorozhnyy, Vasily; Amizadeh, Saeed; Ye, Qiang; Koishida, Kazuhito

doi:10.1109/ICASSP49660.2025.10887633

Citation Details

This content will become publicly available on April 6, 2026

CorrGAN: Simultaneous Learning of Speech Enhancement and Perceptual Quality Loss Functions

Deep-learning models have allowed effective end-to-end SE systems in the Speech Enhancement (SE) field. Most of these methods are trained using a fixed reconstruction loss in a supervised setting. Often these losses do not perfectly represent the desired perceptual quality metrics, resulting in sub-optimal performance. Recently, there have been efforts to learn the behavior of those metrics directly via neural nets for training SE models. However, an accurate estimation of the true metric function introduces statistical complexity for training because it attempts to capture the exact value of the metric. We propose an adversarial training strategy based on statistical correlation that avoids the complexity of estimating the SE metric while learning to mimic its overall behavior. We call this framework CorrGAN and show its significant improvement over standard losses of the SOTA baselines and achieve SOTA performance on the VoiceBank+DEMAND dataset. more »

Award ID(s):: 2208314 2327113 2433190

PAR ID:: 10616198

Author(s) / Creator(s):: Zadorozhnyy, Vasily; Amizadeh, Saeed; Ye, Qiang; Koishida, Kazuhito

Publisher / Repository:: IEEE

Date Published:: 2025-04-06

ISBN:: 979-8-3503-6874-1

Page Range / eLocation ID:: 1 to 5

Format(s):: Medium: X

Location:: Hyderabad, India

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on April 6, 2026
Conference Paper:
https://doi.org/10.1109/ICASSP49660.2025.10887633

More Like this