%AŻelasko, Piotr%AMoro-Velázquez, Laureano%AHasegawa-Johnson, Mark%AScharenborg, Odette%ADehak, Najim%Anull Ed.%D2020%I
%K
%MOSTI ID: 10273581
%PMedium: X
%TThat Sounds Familiar: An Analysis of Phonetic Representations Transfer Across Languages
%XOnly a handful of the world’s languages are abundant with the resources that enable practical applications of speech processing technologies. One of the methods to overcome this problem is to use the resources existing in other languages to train a mul-tilingual automatic speech recognition (ASR) model, which, intuitively, should learn some universal phonetic representations.In  this  work,  we  focus  on  gaining  a  deeper  understanding  ofhow general these representations might be, and how individual phones are getting improved in a multilingual setting.  To that end, we select a phonetically diverse set of languages, and perform a series of monolingual, multilingual and crosslingual (zero-shot) experiments.  The ASR is trained to recognize the International Phonetic Alphabet (IPA) token sequences. We ob-serve significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting, where the model, among other errors, considers Javanese as a tone language.  Notably, as little as 10 hours of the target language training data tremendously reduces ASR error rates.Our  analysis  uncovered  that  even  the  phones  that  are  unique to  a  single  language  can  benefit  greatly  from  adding  training data from other languages - an encouraging result for the low-resource speech community
Country unknown/Code not availablehttps://doi.org/10.21437/Interspeech.2020-2513OSTI-MSA