skip to main content


Title: Supporting small languages together: The history and impact of the International Conference on Language Documentation & Conservation series
The International Conference on Language Documentation & Conservation series, or ICLDC, has, since its inception in 2009, become the flagship conference for the field of language documentation. Every two years, conference attendees gather at the University of Hawai‘i at Mānoa to share their experiences working on diverse topics related to the preservation of underrepresented languages worldwide. Attendees come from a range of backgrounds: Indigenous language communities, language activism organizations, K–12 school systems, as well as students and faculty from colleges and universities. They represent dozens of countries and hundreds of languages, and they have one goal in mind: supporting small languages together. In this paper, we trace the history of the ICLDC series since the first iteration and discuss the scope of its impact on the field of language documentation and conservation according to conference attendees. We also look ahead to the changes that the covid-19 pandemic will bring to the structure of the conference in 2021 and beyond.  more » « less
Award ID(s):
1937611
NSF-PAR ID:
10335912
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
Language documentation and conservation
Volume:
14
ISSN:
1934-5275
Page Range / eLocation ID:
642-666
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Language documentation is increasingly seen as a collaborative process, engaging community members as active participants. Collaborative research produces better documentation that is valuable for both the academic community and the speakers. However, in many communities, speakers and language advocates lack the skills necessary to fully engage in collaborative projects. One way to overcome this barrier is to provide language documentation training to community members. Such training should teach participants how to ethically and comprehensively complete every stage of the documentation process while offering opportunity for theoretical discussion and practical application. In this paper, we offer one possible model for community-based training in language documentation and conservation that focuses on bidirectional learning and capacity building. We describe a training workshop that was held in 2018 in Kupang, the capital of Indonesia’s Nusa Tenggara Timur (NTT) province. A collaboration between the University of Hawai‘i, Leiden University, and Artha Wacana Christian University, this workshop implemented a model based on the practices of the Language Documentation Training Center (LDTC), an organization devoted to training speakers to document their own languages. We detail the NTT workshop itself, summarize post-workshop feedback, and offer suggestions to others looking to provide similar training in speaker communities. 
    more » « less
  2. This paper presents a state-of-the-art model for transcribing speech in any language into the International Phonetic Alphabet (IPA). Transcription of spoken languages into IPA is an essential yet time-consuming process in language documentation, and even partially automating this process has the potential to drastically speed up the documentation of endangered languages. Like the previous best speech-to-IPA model (Wav2Vec2Phoneme), our model is based on wav2vec 2.0 and is fine-tuned to predict IPA from audio input. We use training data from seven languages from CommonVoice 11.0, transcribed into IPA semi-automatically. Although this training dataset is much smaller than Wav2Vec2Phoneme's, its higher quality lets our model achieve comparable or better results. Furthermore, we show that the quality of our universal speech-to-IPA models is close to that of human annotators. 
    more » « less
  3. Once a programmer knows one language, they can leverage concepts and knowledge already learned, and easily pick up another programming language. But is that always the case? To understand if programmers have difficulty learning additional programming languages, we conducted an empirical study of Stack Overflow questions across 18 different programming languages. We hypothesized that previous knowledge could potentially interfere with learning a new programming language. From our inspection of 450 Stack Overflow questions, we found 276 instances of interference that occurred due to faulty assumptions originating from knowledge about a different language. To understand why these difficulties occurred, we conducted semi-structured interviews with 16 professional programmers. The interviews revealed that programmers make failed attempts to relate a new programming language with what they already know. Our findings inform design implications for technical authors, toolsmiths, and language designers, such as designing documentation and automated tools that reduce interference, anticipating uncommon language transitions during language design, and welcoming programmers not just into a language, but its entire ecosystem. 
    more » « less
  4. null (Ed.)
    Statistical data manipulation is a crucial component of many data science analytic pipelines, particularly as part of data ingestion. This task is generally accomplished by writing transformation scripts in languages such as SPSS, Stata, SAS, R, Python (Pandas) and etc. The disparate data models, language representations and transformation operations supported by these tools make it hard for end users to understand and document the transformations performed, and for developers to port transformation code across languages. Tackling these challenges, we present a formal paradigm for statistical data transformation. It consists of a data model, called Structured Data Transformation Data Model (SDTDM), inspired by the data models of multiple statistical transformations frameworks; an algebra, Structural Data Transformation Algebra (SDTA), with the ability to transform not only data within SDTDM but also metadata at multiple structural levels; and an equivalent descriptive counterpart, called Structured Data Transformation Language (SDTL), recently adopted by the DDI Alliance that maintains international standards for metadata as part of its suite of products. Experiments with real statistical transformations on socio-economic data show that SDTL can successfully represent 86.1% and 91.6% respectively of 4,185 commands in SAS and 9,087 commands in SPSS obtained from a repository. We illustrate with examples how SDTA/SDTL could assist with the documentation of statistical data transformation, an important aspect often neglected in metadata of datasets.We propose a system called C2Metadata that automatically captures the transformation and provenance information in SDTL as a part of the metadata. Moreover, given the conversion mechanism from a source statistical language to SDTA/SDTL, we show how functional-equivalent transformation programs could be converted to other functionally equivalent programs, in the same or different language, permitting code reuse and result reproducibility, We also illustrate the possibility of using of SDTA to optimize SDTL transformations using rule-based rewrites similar to SQL optimizations. 
    more » « less
  5. While data collection early in the Americanist tradition included texts as part of the Boasian triad, later developments in the generative tradition moved away from narratives. With a resurgence of attention to texts in both linguistic theory and language documentation, the literature on methodologies is growing (i.e., Chelliah 2001, Chafe 1980, Burton & Matthewson 2015). We outline our approach to collecting Chickasaw texts in what we call a ‘narrative bootcamp.’ Chickasaw is a severely threatened language and no longer in common daily use. Facilitating narrative collection with elder fluent speakers is an important goal, as is the cultivation of second language speakers and the training of linguists and tribal language professionals. Our bootcamps meet these goals. Moreover, we show many positive outcomes to this approach, including a positive sense of language use and ‘fun’ voiced by the elders, the corpus expansion that occurs by collecting and processing narratives onsite in the workshop, and field methods training for novices. Importantly, we find the sparking of personal recollections facilitates the collection of heretofore unrecorded narrative genres in Chickasaw. This approach offers an especially fruitful way to build and expand a text corpus for small communities of highly endangered languages. 
    more » « less