skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Supporting small languages together: The history and impact of the International Conference on Language Documentation & Conservation series
The International Conference on Language Documentation & Conservation series, or ICLDC, has, since its inception in 2009, become the flagship conference for the field of language documentation. Every two years, conference attendees gather at the University of Hawai‘i at Mānoa to share their experiences working on diverse topics related to the preservation of underrepresented languages worldwide. Attendees come from a range of backgrounds: Indigenous language communities, language activism organizations, K–12 school systems, as well as students and faculty from colleges and universities. They represent dozens of countries and hundreds of languages, and they have one goal in mind: supporting small languages together. In this paper, we trace the history of the ICLDC series since the first iteration and discuss the scope of its impact on the field of language documentation and conservation according to conference attendees. We also look ahead to the changes that the covid-19 pandemic will bring to the structure of the conference in 2021 and beyond.  more » « less
Award ID(s):
1937611
PAR ID:
10335912
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Language documentation and conservation
Volume:
14
ISSN:
1934-5275
Page Range / eLocation ID:
642-666
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Language documentation is increasingly seen as a collaborative process, engaging community members as active participants. Collaborative research produces better documentation that is valuable for both the academic community and the speakers. However, in many communities, speakers and language advocates lack the skills necessary to fully engage in collaborative projects. One way to overcome this barrier is to provide language documentation training to community members. Such training should teach participants how to ethically and comprehensively complete every stage of the documentation process while offering opportunity for theoretical discussion and practical application. In this paper, we offer one possible model for community-based training in language documentation and conservation that focuses on bidirectional learning and capacity building. We describe a training workshop that was held in 2018 in Kupang, the capital of Indonesia’s Nusa Tenggara Timur (NTT) province. A collaboration between the University of Hawai‘i, Leiden University, and Artha Wacana Christian University, this workshop implemented a model based on the practices of the Language Documentation Training Center (LDTC), an organization devoted to training speakers to document their own languages. We detail the NTT workshop itself, summarize post-workshop feedback, and offer suggestions to others looking to provide similar training in speaker communities. 
    more » « less
  2. This paper presents a state-of-the-art model for transcribing speech in any language into the International Phonetic Alphabet (IPA). Transcription of spoken languages into IPA is an essential yet time-consuming process in language documentation, and even partially automating this process has the potential to drastically speed up the documentation of endangered languages. Like the previous best speech-to-IPA model (Wav2Vec2Phoneme), our model is based on wav2vec 2.0 and is fine-tuned to predict IPA from audio input. We use training data from seven languages from CommonVoice 11.0, transcribed into IPA semi-automatically. Although this training dataset is much smaller than Wav2Vec2Phoneme's, its higher quality lets our model achieve comparable or better results. Furthermore, we show that the quality of our universal speech-to-IPA models is close to that of human annotators. 
    more » « less
  3. Once a programmer knows one language, they can leverage concepts and knowledge already learned, and easily pick up another programming language. But is that always the case? To understand if programmers have difficulty learning additional programming languages, we conducted an empirical study of Stack Overflow questions across 18 different programming languages. We hypothesized that previous knowledge could potentially interfere with learning a new programming language. From our inspection of 450 Stack Overflow questions, we found 276 instances of interference that occurred due to faulty assumptions originating from knowledge about a different language. To understand why these difficulties occurred, we conducted semi-structured interviews with 16 professional programmers. The interviews revealed that programmers make failed attempts to relate a new programming language with what they already know. Our findings inform design implications for technical authors, toolsmiths, and language designers, such as designing documentation and automated tools that reduce interference, anticipating uncommon language transitions during language design, and welcoming programmers not just into a language, but its entire ecosystem. 
    more » « less
  4. The American Bryological and Lichenological Society (ABLS) held its annual conference in West Portsmouth, Ohio, USA in July, 2024. At the meeting, members of the bryological and lichenological communities socialized and shared their research from the past year. Four morning field trips took place during the conference - two on Friday, July 12th and two on Saturday, July 13th. Attendees searched for, collected, and learned about bryophytes and lichens of southern Ohio during these trips. 
    more » « less
  5. Language documentation encompasses translation, typically into the dominant high-resource language in the region where the target language is spoken. To make data accessible to a broader audience, additional translation into other high-resource languages might be needed. Working within a project documenting Kotiria, we explore the extent to which state-of-the-art machine translation (MT) systems can support this second translation – in our case from Portuguese to English. This translation task is challenging for multiple reasons: (1) the data is out-of-domain with respect to the MT system’s training data, (2) much of the data is conversational, (3) existing translations include non-standard and uncommon expressions, often reflecting properties of the documented language, and (4) the data includes borrowings from other regional languages. Despite these challenges, existing MT systems perform at a usable level, though there is still room for improvement. We then conduct a qualitative analysis and suggest ways to improve MT between high-resource languages in a language documentation setting. 
    more » « less