Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous?

Li, Jialu; Hasegawa-Johnson, Mark

doi:10.21437/Interspeech.2020-1834

Citation Details

Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous?

Phones, the segmental units of the International Phonetic Al-phabet (IPA), are used for lexical distinctions in most human languages; Tones, the suprasegmental units of the IPA,are used in perhaps 70%. Many previous studies have explored cross-lingual adaptation of automatic speech recognition(ASR) phone models, but few have explored the multilingual and cross-lingual transfer of synchronization between phones and tones. In this paper, we test four Connectionist Temporal Classification (CTC)-based acoustic models, differing in the degree of synchrony they impose between phones and tones.Models are trained and tested multilingually in three languages,then adapted and tested cross-lingually in a fourth. Both synchronous and asynchronous models are effective in both multi-lingual and cross-lingual settings. Synchronous models achieve lower error rate in the joint phone+tone tier, but asynchronous training results in lower tone error rate. more »

Award ID(s):: 1910319

PAR ID:: 10273578

Author(s) / Creator(s):: Li, Jialu; Hasegawa-Johnson, Mark

Date Published:: 2020-10-25

Journal Name:: Interspeech

Page Range / eLocation ID:: 1027 to 1031

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.21437/Interspeech.2020-1834

More Like this