Self-supervised speech representations display some human-like cross-linguistic perceptual abilities

Rodriguez, Joselyn; Sreepada, Kamala; Famularo, Ruolan Leslie; Goldwater, Sharon; Feldman, Naomi H

Citation Details

State of the art models in automatic speech recognition have shown remarkable improvements due to modern self-supervised (SSL) transformer-based architectures such as wav2vec 2.0 (Baevski et al., 2020). However, how these models encode phonetic information is still not well understood. We explore whether SSL speech models display a linguistic property that characterizes human speech perception: language specificity. We show that while wav2vec 2.0 displays an overall language specificity effect when tested on Hindi vs. English, it does not resemble human speech perception when tested on finer-grained differences in Hindi speech contrasts. more »

Award ID(s):: 2120834

PAR ID:: 10568027

Author(s) / Creator(s):: Rodriguez, Joselyn; Sreepada, Kamala; Famularo, Ruolan Leslie; Goldwater, Sharon; Feldman, Naomi H

Publisher / Repository:: Proceedings of the 28th Conference on Computational Natural Language Learning

Date Published:: 2024-11-15

Page Range / eLocation ID:: 458-463

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this