NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A soft and fast pattern matcher for billion-scale corpus searches

Deguchi, Hiroyuki; Kamoda, Go; Matsushita, Yusuke; Taguchi, Chihiro; Suenaga, Kohey; Waga, Masaki; Yokoi, Sho (April 2025, International Conference on Learning Representations)

Free, publicly-accessible full text available April 1, 2026
Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn’t

Taguchi, Chihiro; Chiang, David (August 2024, ACL Anthology)

Full Text Available
Japanese Rule-based Grapheme-to-phoneme Conversion System and Multilingual Named Entity Dataset with International Phonetic Alphabet

Matogawa, Yuhi; Sakai, Yusuke; Watanabe, Taro; Taguchi, Chihiro (June 2024, ACL Anthology)

Full Text Available
Non-discourse-configurationality in Imbabura Kichwa

https://doi.org/10.3765/plsa.v9i1.5687

Taguchi, Chihiro; Saransig, Jefferson (May 2024, Proceedings of the Linguistic Society of America)

This study investigates the syntactic structure of Imbabura Kichwa, a Quechuan language spoken in the Imbabura Province of Ecuador, with a focus on the seemingly free word order in grammatical functions and discourse-semantic functions (i.e., topic and focus). We first provide the data and overviews of the non-configurationality and non-discourse-configurationality of Imbabura Kichwa. Then, we demonstrate that the underlying syntactic structure of Imbabura Kichwa is built up hierarchically based on the agreement of focus enclitics with clause types and polarity. Finally, we argue that the non-configurationality and non-discourse-configurationality are the surface realization of the movement from the underlying structure to the daughter positions of a non-projective category S.
more » « less
Full Text Available
J-SNACS: Adposition and Case Supersenses for Japanese Joshi

Aoyama, Tatsuya; Taguchi, Chihiro; Schneider, Nathan (May 2024, ACL Anthology)

Full Text Available
Killkan: The Automatic Speech Recognition Dataset for Kichwa with Morphosyntactic Information

Taguchi, Chihiro; Saransig, Jefferson; Velásquez, Dayana; Chiang, David (May 2024, ACL Anthology)

Full Text Available
Strategies for the Annotation of Pronominalised Locatives in Turkic Universal Dependency Treebanks

Washington, Jonathan; Çöltekin, Çağrı; Akkurt, Furkan; Chontaeva, Bermet; Eslami, Soudabeh; Jumalieva, Gulnura; Kasieva, Aida; Kuzgun, Aslı; Marşan, Büşra; Taguchi, Chihiro (May 2024, ACL Anthology)

Full Text Available
Introducing Morphology in Universal Dependencies Japanese

Chihiro Taguchi and David Chiang (January 2023, Proceedings of the Sixth Workshop on Universal Dependencies (UDW, GURT/SyntaxFest 2023))

This paper discusses the need for including morphological features in Japanese Universal Dependencies (UD). In the current version (v2.11) of the Japanese UD treebanks, sentences are tokenized at the morpheme level, and almost no morphological feature annotation is used. However, Japanese is not an isolating language that lacks morphological inflection but is an agglutinative language. Given this situation, we introduce a tentative scheme for retokenization and morphological feature annotation for Japanese UD. Then, we measure and compare the morphological complexity of Japanese with other languages to demonstrate that the proposed tokenizations show similarities to synthetic languages reflecting the linguistic typology.
more » « less
Full Text Available
形態論情報付き日本語 Universal Dependencies

田口智大, 宮川創 (January 2023, 言語処理学会第29回年次大会 (29th Annual Meeting of the ANLP))

本論文では，日本語の形態論を反映した，日本語 Universal Dependencies(UD)の新しいトークン化の基準と形態的素性のアノテーションを提案する.現在のバージョンの日本語 UD v2.11 では，文は形態素単位でトークン化されており，各トークンには形態的素性がほとんど付与されていない.しかしながら，日本語は形態的屈折を欠いた孤立語ではなく，形態的変化を持った膠着語である.この現状を考慮して，本稿では日本語 UD のトークン化の基準を見直し，形態素単位ではなく語単位のトークン化を提案する.そして，各トークンに含まれた形態論的情報を表すために，UD 共通の形態的素性のアノテーションを提案する.
more » « less
Full Text Available
Universal automatic phonetic transcription into the International Phonetic Alphabet

Chihiro Taguchi, Yusuke Sakai (January 2023, Interspeech)

This paper presents a state-of-the-art model for transcribing speech in any language into the International Phonetic Alphabet (IPA). Transcription of spoken languages into IPA is an essential yet time-consuming process in language documentation, and even partially automating this process has the potential to drastically speed up the documentation of endangered languages. Like the previous best speech-to-IPA model (Wav2Vec2Phoneme), our model is based on wav2vec 2.0 and is fine-tuned to predict IPA from audio input. We use training data from seven languages from CommonVoice 11.0, transcribed into IPA semi-automatically. Although this training dataset is much smaller than Wav2Vec2Phoneme's, its higher quality lets our model achieve comparable or better results. Furthermore, we show that the quality of our universal speech-to-IPA models is close to that of human annotators.
more » « less
Full Text Available

« Prev Next »

Search for: All records