Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Free, publicly-accessible full text available April 1, 2026
-
This study investigates the syntactic structure of Imbabura Kichwa, a Quechuan language spoken in the Imbabura Province of Ecuador, with a focus on the seemingly free word order in grammatical functions and discourse-semantic functions (i.e., topic and focus). We first provide the data and overviews of the non-configurationality and non-discourse-configurationality of Imbabura Kichwa. Then, we demonstrate that the underlying syntactic structure of Imbabura Kichwa is built up hierarchically based on the agreement of focus enclitics with clause types and polarity. Finally, we argue that the non-configurationality and non-discourse-configurationality are the surface realization of the movement from the underlying structure to the daughter positions of a non-projective category S.more » « less
-
This paper discusses the need for including morphological features in Japanese Universal Dependencies (UD). In the current version (v2.11) of the Japanese UD treebanks, sentences are tokenized at the morpheme level, and almost no morphological feature annotation is used. However, Japanese is not an isolating language that lacks morphological inflection but is an agglutinative language. Given this situation, we introduce a tentative scheme for retokenization and morphological feature annotation for Japanese UD. Then, we measure and compare the morphological complexity of Japanese with other languages to demonstrate that the proposed tokenizations show similarities to synthetic languages reflecting the linguistic typology.more » « less
-
本論文では,日本語の形態論を反映した,日本語 Universal Dependencies(UD)の新しいトークン化の 基準と形態的素性のアノテーションを提案する.現 在のバージョンの日本語 UD v2.11 では,文は形態 素単位でトークン化されており,各トークンには形 態的素性がほとんど付与されていない.しかしなが ら,日本語は形態的屈折を欠いた孤立語ではなく, 形態的変化を持った膠着語である.この現状を考慮 して,本稿では日本語 UD のトークン化の基準を見 直し,形態素単位ではなく語単位のトークン化を提 案する.そして,各トークンに含まれた形態論的情 報を表すために,UD 共通の形態的素性のアノテー ションを提案する.more » « less
-
This paper presents a state-of-the-art model for transcribing speech in any language into the International Phonetic Alphabet (IPA). Transcription of spoken languages into IPA is an essential yet time-consuming process in language documentation, and even partially automating this process has the potential to drastically speed up the documentation of endangered languages. Like the previous best speech-to-IPA model (Wav2Vec2Phoneme), our model is based on wav2vec 2.0 and is fine-tuned to predict IPA from audio input. We use training data from seven languages from CommonVoice 11.0, transcribed into IPA semi-automatically. Although this training dataset is much smaller than Wav2Vec2Phoneme's, its higher quality lets our model achieve comparable or better results. Furthermore, we show that the quality of our universal speech-to-IPA models is close to that of human annotators.more » « less
An official website of the United States government

Full Text Available