skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on May 3, 2026

Title: Digital Documentation for Diasporic Data: challenges, opportunities, and solutions for working with Diaspora Communities
Language documentation involving diaspora communities presents a combination of challenges and opportunities for approaches leveraging large data collection and assisted transcription and annotation. Demonstrating our projects on Kichwa and Mapudungun, we will present a suite of our computational tools designed to effectively work with diaspora communities for language documentation.  more » « less
Award ID(s):
2109578
PAR ID:
10633221
Author(s) / Creator(s):
; ; ; ;
Publisher / Repository:
9th International Conference on Language Documentation & Conservation (ICLDC)
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Advances in speech and language processing have enabled the creation of applications that could, in principle, accelerate the process of language documentation, as speech communities and linguists work on urgent language documentation and reclamationnprojects. However, such systems have yet to make a significant impact on language documentation, as resource requirements limit the broad applicability of these new techniques. We aim to exploit the framework of shared tasks to focus the technology research community on tasks which address key pain points in language documentation. Here we presentninitial steps in the implementation of these new shared tasks, through the creation of data sets drawn from endangered language repositories and baseline systems to perform segmentation and speaker labeling of these audio recordings—important enabling steps in the documentation process. This paper motivates these tasks with a use case, describes data set curation and baseline systems, and presents results on this data. We then highlight the challenges and ethical considerations in developing these speech processing tools and tasks to support endangered language documentation. 
    more » « less
  2. Here we present research resulting from a tribal-academic collaboration between the Chickasaw Language Revitalization Program (CLRP) and the University of Texas at Arlington (UTA). This collaboration began three years ago, with a UTA service-learning trip to Ada, Oklahoma. The Chickasaw Language Revitalization Program is vigorously engaged in many activities to support language use by the remaining 70 or so fluent speakers. Communities facing such stark endangerment must address revitalization and documentation simultaneously, and in a way that maximizes resources. Our partnership addresses this challenge. This paper draws on the principles of Community-Based Language Research, defined in Czaykowska-Higgins (2009: 24) as a model that “not only allows for the production of knowledge on a language, but also assumes that that knowledge can and should be constructed for, with, and by community members, and that it is therefore not merely (or primarily) for or by linguists.” Benefitting from an action-research model, our collaboration supports the Chickasaw community by developing revitalization-driven documentation and training materials for learners that both feed into and are drawn from documentation. Both sides of our collaboration are committed to the transfer of knowledge, especially sharing our findings and knowledge with other endangered language communities. 
    more » « less
  3. The need exists to build knowledge toward addressing issues related to international disaster migrants into the United States, a phenomenon that the United Nations perceives as increasingly imminent in the next few decades due to potential refugees fleeing climate change-related events. There is a gap however in scholarly work on the role of diaspora groups and host communities in post-disaster recovery and reconstruction. The Haitian diaspora in the United States will be a lifeline as Haiti recovers and rebuilds from the devastating earthquake disaster of January 12 th 2010. This article reports on observations and findings from our research to understand the specific roles of the Haitian diaspora associations based in South Florida, as well as the role of host communities, nongovernmental organizations and government agencies that assisted earthquake survivors and displacees in the South Florida region. The findings are based on twenty-six interviews conducted within the time-frame of June 2010 to December 2010. Half of these interviewees represented the diaspora associations based in South Florida. Findings indicate that these organizations and host communities played a vital role in disaster relief and response processes. 
    more » « less
  4. Language documentation is increasingly seen as a collaborative process, engaging community members as active participants. Collaborative research produces better documentation that is valuable for both the academic community and the speakers. However, in many communities, speakers and language advocates lack the skills necessary to fully engage in collaborative projects. One way to overcome this barrier is to provide language documentation training to community members. Such training should teach participants how to ethically and comprehensively complete every stage of the documentation process while offering opportunity for theoretical discussion and practical application. In this paper, we offer one possible model for community-based training in language documentation and conservation that focuses on bidirectional learning and capacity building. We describe a training workshop that was held in 2018 in Kupang, the capital of Indonesia’s Nusa Tenggara Timur (NTT) province. A collaboration between the University of Hawai‘i, Leiden University, and Artha Wacana Christian University, this workshop implemented a model based on the practices of the Language Documentation Training Center (LDTC), an organization devoted to training speakers to document their own languages. We detail the NTT workshop itself, summarize post-workshop feedback, and offer suggestions to others looking to provide similar training in speaker communities. 
    more » « less
  5. The study of language shift, the replacement of one language by another in a community, or subgroup of a speech community, is a prime topic for sociolinguistic analysis: shift is almost always the result of social factors. This paper argues for focusing research on the study of shift in process and, to that end, studying the different kinds of speakers in shifting communities. The prevalent response to massive, global language shift by linguists is language documentation. Although the need for documentation is clear, there have been inadvertent consequences: valorizing last speakers, promoting linguistic purism, and devaluing L2 language learners who, in many communities, represent the future of the language. The urgency of documenting and describing languages with relatively small numbers of elderly speakers has led the linguistic community to focus almost exclusively on such groups and ignore both larger speech communities in earlier stages of shift, and overlook the wide range of speaker types in shift communities. From a social standpoint, the result is that we are often failing to do the language work in precisely those communities where reversing language shift is still relatively easy. From a scientific standpoint, we are missing the opportunity to study language change in process, and missing the chance to study speaker variation in a shift situation. Variation in proficiency and performance across shifting speakers is not random but systematic and correlates with a set of social and cognitive factors. 
    more » « less