NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Universal automatic phonetic transcription into the International Phonetic Alphabet

Chihiro Taguchi, Yusuke Sakai (January 2023, Interspeech)

This paper presents a state-of-the-art model for transcribing speech in any language into the International Phonetic Alphabet (IPA). Transcription of spoken languages into IPA is an essential yet time-consuming process in language documentation, and even partially automating this process has the potential to drastically speed up the documentation of endangered languages. Like the previous best speech-to-IPA model (Wav2Vec2Phoneme), our model is based on wav2vec 2.0 and is fine-tuned to predict IPA from audio input. We use training data from seven languages from CommonVoice 11.0, transcribed into IPA semi-automatically. Although this training dataset is much smaller than Wav2Vec2Phoneme's, its higher quality lets our model achieve comparable or better results. Furthermore, we show that the quality of our universal speech-to-IPA models is close to that of human annotators.
more » « less
Full Text Available
Introducing Morphology in Universal Dependencies Japanese

Chihiro Taguchi and David Chiang (January 2023, Proceedings of the Sixth Workshop on Universal Dependencies (UDW, GURT/SyntaxFest 2023))

This paper discusses the need for including morphological features in Japanese Universal Dependencies (UD). In the current version (v2.11) of the Japanese UD treebanks, sentences are tokenized at the morpheme level, and almost no morphological feature annotation is used. However, Japanese is not an isolating language that lacks morphological inflection but is an agglutinative language. Given this situation, we introduce a tentative scheme for retokenization and morphological feature annotation for Japanese UD. Then, we measure and compare the morphological complexity of Japanese with other languages to demonstrate that the proposed tokenizations show similarities to synthetic languages reflecting the linguistic typology.
more » « less
Full Text Available
Mermaid Constructions in Lexical Functional Grammar

Chihiro Taguchi (January 2022, Lexical-Functional Grammar Conference)

This paper provides a cross-linguistic analysis of Mermaid Constructions in terms of Lexical Functional Grammar. Mermaid Constructions, coined by Tsunoda (2020), are grammaticalized monoclausal constructions in which a verb and a noun, sometimes with a copula, form a compound predicate. However, the work was chiefly descriptive, and the morphosyntactic nature of Mermaid Constructions in theoretical terms has not yet been explained. In this work, in opposition to Tsunoda’s (2020) hypothesis, the following points are shown: (1) Mermaid Constructions are not monoclausal but biclausal; (2) Mermaid Constructions do not comprise a compound predicate, but are control and raising with a nominal predicate; (3) these findings hold cross-linguistically.
more » « less
Full Text Available

Search for: All records