Direct articulatory observation reveals phoneme recognition performance characteristics of a self-supervised speech model

Shi, Xuan (ORCID:0000000213875418); Feng, Tiantian (ORCID:0000000220539068); Huang, Kevin; Kadiri, Sudarsana_Reddy; Lee, Jihwan; Lu, Yijing; Zhang, Yubin; Goldstein, Louis; Narayanan, Shrikanth

doi:10.1121/10.0034430

Citation Details

Direct articulatory observation reveals phoneme recognition performance characteristics of a self-supervised speech model

Variability in speech pronunciation is widely observed across different linguistic backgrounds, which impacts modern automatic speech recognition performance. Here, we evaluate the performance of a self-supervised speech model in phoneme recognition using direct articulatory evidence. Findings indicate significant differences in phoneme recognition, especially in front vowels, between American English and Indian English speakers. To gain a deeper understanding of these differences, we conduct real-time MRI-based articulatory analysis, revealing distinct velar region patterns during the production of specific front vowels. This underscores the need to deepen the scientific understanding of self-supervised speech model variances to advance robust and inclusive speech technology. more »

Award ID(s):: 2311676 2106930

PAR ID:: 10589291

Author(s) / Creator(s):: Shi, Xuan; Feng, Tiantian; Huang, Kevin; Kadiri, Sudarsana_Reddy; Lee, Jihwan; Lu, Yijing; Zhang, Yubin; Goldstein, Louis; Narayanan, Shrikanth

Publisher / Repository:: Acoustical Society of America (ASA)

Date Published:: 2024-11-18

Journal Name:: JASA Express Letters

Volume:: 4

Issue:: 11

ISSN:: 2691-1191

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Journal Article:
https://doi.org/10.1121/10.0034430

More Like this