Jean-Jacques Rousseau
; Bill Kapralos
; Henrik I. Christensen
; Michael Jenkin
; Cheng-Lin
(Ed.)
Exponential growth in the use of smart speakers (SS) for the automation of homes, offices, and vehicles has brought a revolution of convenience to our lives. However, these SSs are susceptible to a variety of spoofing attacks, known/seen and unknown/unseen, created using cutting-edge AI generative algorithms. The realistic nature of these powerful attacks is capable of deceiving the automatic speaker verification (ASV) engines of these SSs, resulting in a huge potential for fraud using these devices. This vulnerability highlights the need for the development of effective countermeasures capable of the reliable detection of known and unknown spoofing attacks. This paper presents a novel end-to-end deep learning model, AEXANet, to effectively detect multiple types of physical- and logical-access attacks, both known and unknown. The proposed countermeasure has the ability to learn low-level cues by analyzing raw audio, utilizes a dense convolutional network for the propagation of diversified raw waveform features, and strengthens feature propagation. This system employs a maximum feature map activation function, which improves the performance against unseen spoofing attacks while making the model more efficient, enabling the model to be used for real-time applications. An extensive evaluation of our model was performed on the ASVspoof 2019 PA and LA datasets, along with TTS and VC samples, separately containing both seen and unseen attacks. Moreover, cross corpora evaluation using the ASVspoof 2019 and ASVspoof 2015 datasets was also performed. Experimental results show the reliability of our method for voice spoofing detection.
more »
« less