Title: Ticuna (tca) language documentation: A guide to materials in the California Language Archive
Ticuna (ISO: tca) is a language isolate spoken in the northwestern Amazon Basin (Brazil, Colombia, Peru). Ticuna has more speakers than almost all other Indigenous Amazonian languages and – unlike most languages of the area – is still learned by children. Yet academic linguists have given it relatively little research attention. Therefore, to raise the profile of this areally important language, I offer a guide to three collections of Ticuna language materials held in the California Language Archive. These materials are extensive, including over 1,396 hours of recordings – primarily of child language and everyday conversations between adults – and 33 hours of transcriptions. To contextualize the materials, I provide background on the Ticuna language and people; the research projects which produced the materials; the participants who appear in them; and the ethical and permissions issues involved in collecting them. I then discuss the nature and scope of the materials, showing how the content of each collection motivated collection-specific choices about recording, transcription, organization in the archive, and metadata. Last, I outline how other researchers could draw on the collections for comparative analysis. more »« less
Sullivant, Ryan
(, Language documentation and conservation)
null
(Ed.)
Users of digital language archives face a number of barriers when trying to discover and reuse the materials preserved in the digital collections created by current language documentation projects. These barriers include sparse descriptive metadata throughout many collections and the prevalence of audio-video materials that are impervious to text-based search. Users could more easily evaluate, navigate, and use such a collection if it contained a guide that contextualized it, summarized its contents, and helped users identify and locate items within it. This article will discuss the importance of thorough collection descriptions and finding aids by synthesizing guidelines and best practices for archival description created for traditional archives and adapting these to the structure and makeup of today’s digital language documentation collections. To facilitate the iterative description of growing collections, the checklist of information to include is presented in three groups of descending priority.
Beer, Samuel J.
(, Language documentation and description)
As theorized in language documentation, archives serve to make research reproducible and to make primary data accessible for multiple audiences (Himmelmann 2006; Berez-Kroeker et al. 2018). Scholars in the emerging mid-20th-century field of African history emphasized these same priorities. Mid-century Africanist historians assembled large text collections but failed in a clearly stated disciplinary project to preserve them in accessible archives. This paper explores the relationship between institutional and social factors in data preservation through the story of audio recordings and field notes documenting Soo (Uganda: Kuliak/Nilo-Saharan) collected in the mid-20th century by Makerere University history PhD student John M. Weatherby. For decades, Weatherby struggled and failed to find an institutional home for his materials, which were nearly lost amid changing disciplinary trends. I encountered them only through informal social interactions in 2018 and have subsequently been depositing them in a language archive. The slide of Weatherby’s data into obscurity shows how archiving is inherently a disciplinary practice. Institutions intending to preserve data rose and fell with changing disciplinary paradigms, but Weatherby’s data were preserved through personal relationships. Despite a common emphasis on technical and institutional initiatives for archiving, the relational contexts of legacy materials are central to their preservation.
Holton, Gary
(, Proceedings of the Foundation for Endangered Languages Conference)
Over the past three decades the field of linguistics has refocused attention on endangered languages, and enormous strides have been made to document these languages and develop archive infrastructure for language data. Although the potential for language archives to support language renewal efforts has often been tacitly assumed, much greater attention has been given to the preservation of data than to access and utilization. Documentation activities are imagined as a race against time to get language data into a lasting form before the last speakers pass away. Here I describe three examples of efforts which are working to engage with language communities and increase the accessibility and usability of language resources. Though not necessarily representative, these efforts suggest ways in which linguists, archivists, and communities can collaborate to support digital return.
Song, Yueqi; Khanuja, Simran; Liu, Pengfei; Faisal, Fahim; Ostapenko, Alissa; Winata, Genta; Aji, Alham; Cahyawijaya, Samuel; Tsvetkov, Yulia; Anastasopoulos, Antonios; et al
(, Association for Computational Linguistics)
Despite the major advances in NLP, significant disparities in NLP system performance across languages still exist. Arguably, these are due to uneven resource allocation and sub-optimal incentives to work on less resourced languages. To track and further incentivize the global development of equitable language technology, we introduce GlobalBench. Prior multilingual benchmarks are static and have focused on a limited number of tasks and languages. In contrast, GlobalBench is an ever-expanding collection that aims to dynamically track progress on all NLP datasets in all languages. Rather than solely measuring accuracy, GlobalBench also tracks the estimated per-speaker utility and equity of technology across all languages, providing a multi-faceted view of how language technology is serving people of the world. Furthermore, GlobalBench is designed to identify the most under-served languages, and rewards research efforts directed towards those languages. At present, the most under-served languages are the ones with a relatively high population, but nonetheless overlooked by composite multilingual benchmarks (like Punjabi, Portuguese, and Wu Chinese). Currently, GlobalBench covers 966 datasets in 190 languages, and has 1,128 system submissions spanning 62 languages.
Poelen, Jorrit H; Seltmann, Katja C; Campbell, Mariel; Orlofske, Sarah A; Light, Jessica E; Tucker, Erika M; Demboski, John R; McElrath, Tommy; Grinter, Christopher C; Diaz-Bastin, Rachel; et al
(, Zenodo)
{"Abstract":["PLEASE CONTACT AUTHORS IF YOU CONTRIBUTED AND WOULD LIKE TO BE LISTED AS A CO-AUTHOR. Terrestrial Parasite Tracker indexed biotic interactions and review summary. The Terrestrial Parasite Tracker (TPT) project began in 2019 and is funded by the National Science foundation to mobilize data from vector and ectoparasite collections to data aggregators (e.g., iDigBio, GBIF) to help build a comprehensive picture of arthropod host-association evolution, distributions, and the ecological interactions of disease vectors which will assist scientists, educators, land managers, and policy makers. Arthropod parasites often are important to human and wildlife health and safety as vectors of pathogens, and it is critical to digitize these specimens so that they, and their biotic interaction data, will be available to help understand and predict the spread of human and wildlife disease. This data publication contains versioned TPT associated datasets and related data products that were tracked, reviewed and indexed by Global Biotic Interactions (GloBI) and associated tools. GloBI provides open access to finding species interaction data (e.g., predator-prey, pollinator-plant, pathogen-host, parasite-host) by combining existing open datasets using open source software. If you have questions or comments about this publication, please open an issue at https://github.com/ParasiteTracker/tpt-reporting or contact the authors by email. Funding: The creation of this archive was made possible by the National Science Foundation award "Collaborative Research: Digitization TCN: Digitizing collections to trace parasite-host associations and predict the spread of vector-borne disease," Award numbers DBI:1901932 and DBI:1901926 References: Jorrit H. Poelen, James D. Simons and Chris J. Mungall. (2014). Global Biotic Interactions: An open infrastructure to share and analyze species-interaction datasets. Ecological Informatics. https://doi.org/10.1016/j.ecoinf.2014.08.005. GloBI Data Review Report Datasets under review: - University of Michigan Museum of Zoology Insect Division. Full Database Export 2020-11-20 provided by Erika Tucker and Barry Oconner. accessed via https://github.com/EMTuckerLabUMMZ/ummzi/archive/6731357a377e9c2748fc931faa2ff3dc0ce3ea7a.zip on 2022-10-12T18:43:37.491Z - Academy of Natural Sciences Entomology Collection for the Parasite Tracker Project accessed via https://github.com/globalbioticinteractions/ansp-para/archive/5e6592ad09ec89ba7958266ad71ec9d5d21d1a44.zip on 2022-10-12T18:45:13.893Z - Bernice Pauahi Bishop Museum, J. Linsley Gressitt Center for Research in Entomology accessed via https://github.com/globalbioticinteractions/bpbm-ent/archive/c085398dddd36f8a1169b9cf57de2a572229341b.zip on 2022-10-12T18:47:33.370Z - Texas A&M University, Biodiversity Teaching and Research Collections accessed via https://github.com/globalbioticinteractions/brtc-para/archive/f0a718145b05ed484c4d88947ff712d5f6395446.zip on 2022-10-12T18:49:42.688Z - Brigham Young University Arthropod Museum accessed via https://github.com/globalbioticinteractions/byu-byuc/archive/4a609ac6a9a03425e2720b6cdebca6438488f029.zip on 2022-10-12T18:50:01.049Z - California Academy of Sciences Entomology accessed via https://github.com/globalbioticinteractions/cas-ent/archive/562aea232ec74ab615f771239451e57b057dc7c0.zip on 2022-10-12T18:50:25.480Z - Clemson University Arthropod Collection accessed via https://github.com/globalbioticinteractions/cu-cuac/archive/6cdcbbaa4f7cec8e1eac705be3a999bc5259e00f.zip on 2022-10-12T18:50:53.662Z - Denver Museum of Nature and Science (DMNS) Parasite specimens (DMNS:Para) accessed via https://github.com/globalbioticinteractions/dmns-para/archive/2a15f657d5e2d7a6ee6359ee30e630bde8fea2ee.zip on 2022-10-12T18:52:36.684Z - Field Museum of Natural History IPT accessed via https://github.com/globalbioticinteractions/fmnh/archive/6bfc1b7e46140e93f5561c4e837826204adb3c2f.zip on 2022-10-12T19:19:24.919Z - Illinois Natural History Survey Insect Collection accessed via https://github.com/globalbioticinteractions/inhs-insects/archive/38692496f590577074c7cecf8ea37f85d0594ae1.zip on 2022-10-12T19:21:30.100Z - UMSP / University of Minnesota / University of Minnesota Insect Collection accessed via https://github.com/globalbioticinteractions/min-umsp/archive/3f1b9d32f947dcb80b9aaab50523e097f0e8776e.zip on 2022-10-12T19:22:18.235Z - Milwaukee Public Museum Biological Collections Data Portal accessed via https://github.com/globalbioticinteractions/mpm/archive/9f44e99c49ec5aba3f8592cfced07c38d3223dcd.zip on 2022-10-12T19:22:42.835Z - Museum for Southwestern Biology (MSB) Parasite Collection accessed via https://github.com/globalbioticinteractions/msb-para/archive/f13bfa0d5493057198639d566f744379c05179f3.zip on 2022-10-12T20:46:06.063Z - The Albert J. Cook Arthropod Research Collection accessed via https://github.com/globalbioticinteractions/msu-msuc/archive/38960906380443bd8108c9e44aeff4590d8d0b50.zip on 2022-10-12T21:02:26.320Z - Ohio State University Acarology Laboratory accessed via https://github.com/globalbioticinteractions/osal-ar/archive/876269d66a6a94175dbb6b9a604897f8032b93dd.zip on 2022-10-12T21:02:46.553Z - Frost Entomological Museum, Pennsylvania State University accessed via https://github.com/globalbioticinteractions/psuc-ento/archive/30b1f96619a6e9f10da18b42fb93ff22cc4f72e2.zip on 2022-10-12T21:02:57.714Z - Purdue Entomological Research Collection accessed via https://github.com/globalbioticinteractions/pu-perc/archive/e0909a7ca0a8df5effccb288ba64b28141e388ba.zip on 2022-10-12T21:03:17.696Z - Texas A&M University Insect Collection accessed via https://github.com/globalbioticinteractions/tamuic-ent/archive/f261a8c192021408da67c39626a4aac56e3bac41.zip on 2022-10-12T21:03:56.509Z - University of California Santa Barbara Invertebrate Zoology Collection accessed via https://github.com/globalbioticinteractions/ucsb-izc/archive/4d997dbe8e86398f9f7f4d7851013e788073ae9c.zip on 2022-10-12T21:05:27.222Z - University of Hawaii Insect Museum accessed via https://github.com/globalbioticinteractions/uhim/archive/53fa790309e48f25685e41ded78ce6a51bafde76.zip on 2022-10-12T21:05:40.778Z - University of New Hampshire Collection of Insects and other Arthropods UNHC-UNHC accessed via https://github.com/globalbioticinteractions/unhc/archive/f72575a72edda8a4e6126de79b4681b25593d434.zip on 2022-10-12T21:05:59.319Z - Scott L. Gardner and Gabor R. Racz (2021). University of Nebraska State Museum - Parasitology. Harold W. Manter Laboratory of Parasitology. University of Nebraska State Museum. accessed via https://github.com/globalbioticinteractions/unl-nsm/archive/6bcd8aec22e4309b7f4e8be1afe8191d391e73c6.zip on 2022-10-12T21:06:07.054Z - Data were obtained from specimens belonging to the United States National Museum of Natural History (USNM), Smithsonian Institution, Washington DC and digitized by the Walter Reed Biosystematics Unit (WRBU). accessed via https://github.com/globalbioticinteractions/usnmentflea/archive/ce5cb1ed2bbc13ee10062b6f75a158fd465ce9bb.zip on 2022-10-12T21:06:43.102Z - US National Museum of Natural History Ixodes Records accessed via https://github.com/globalbioticinteractions/usnm-ixodes/archive/c5fcd5f34ce412002783544afb628a33db7f47a6.zip on 2022-10-12T21:06:51.935Z - Price Institute of Parasite Research, School of Biological Sciences, University of Utah accessed via https://github.com/globalbioticinteractions/utah-piper/archive/43da8db550b5776c1e3d17803831c696fe9b8285.zip on 2022-10-12T21:07:03.317Z - University of Wisconsin Stevens Point, Stephen J. Taft Parasitological Collection accessed via https://github.com/globalbioticinteractions/uwsp-para/archive/f9d0d52cd671731c7f002325e84187979bca4a5b.zip on 2022-10-12T21:07:14.513Z - Giraldo-Calderón, G. I., Emrich, S. J., MacCallum, R. M., Maslen, G., Dialynas, E., Topalis, P., … Lawson, D. (2015). VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases. Nucleic acids research, 43(Database issue), D707–D713. doi:10.1093/nar/gku1117. accessed via https://github.com/globalbioticinteractions/vectorbase/archive/00d6285cd4e9f4edd18cb2778624ab31b34b23b8.zip on 2022-10-12T21:07:22.543Z - WIRC / University of Wisconsin Madison WIS-IH / Wisconsin Insect Research Collection accessed via https://github.com/globalbioticinteractions/wis-ih-wirc/archive/34162b86c0ade4b493471543231ae017cc84816e.zip on 2022-10-12T21:07:52.105Z - Yale University Peabody Museum Collections Data Portal accessed via https://github.com/globalbioticinteractions/yale-peabody/archive/43be869f17749d71d26fc820c8bd931d6149fe8e.zip on 2022-10-12T21:16:57.226Z Generated on: 2022-10-12 by: GloBI's Elton 0.12.4 (see https://github.com/globalbioticinteractions/elton). Note that all files ending with .tsv are files formatted as UTF8 encoded tab-separated values files. https://www.iana.org/assignments/media-types/text/tab-separated-values Included in this review archive are: README: This file. review_summary.tsv: Summary across all reviewed collections of total number of distinct review comments. review_summary_by_collection.tsv: Summary by reviewed collection of total number of distinct review comments. indexed_interactions_by_collection.tsv: Summary of number of indexed interaction records by institutionCode and collectionCode. review_comments.tsv.gz: All review comments by collection. indexed_interactions_full.tsv.gz: All indexed interactions for all reviewed collections. indexed_interactions_simple.tsv.gz: All indexed interactions for all reviewed collections selecting only sourceInstitutionCode, sourceCollectionCode, sourceCatalogNumber, sourceTaxonName, interactionTypeName and targetTaxonName. datasets_under_review.tsv: Details on the datasets under review. elton.jar: Program used to update datasets and generate the review reports and associated indexed interactions. datasets.zip: Source datasets used by elton.jar in process of executing the generate_report.sh script. generate_report.sh: Program used to generate the report generate_report.log: Log file generated as part of running the generate_report.sh script"]}
Skilton, Amalia. Ticuna (tca) language documentation: A guide to materials in the California Language Archive. Retrieved from https://par.nsf.gov/biblio/10275773. Language documentation and conservation 15.
Skilton, Amalia. Ticuna (tca) language documentation: A guide to materials in the California Language Archive. Language documentation and conservation, 15 (). Retrieved from https://par.nsf.gov/biblio/10275773.
Skilton, Amalia.
"Ticuna (tca) language documentation: A guide to materials in the California Language Archive". Language documentation and conservation 15 (). Country unknown/Code not available. https://par.nsf.gov/biblio/10275773.
@article{osti_10275773,
place = {Country unknown/Code not available},
title = {Ticuna (tca) language documentation: A guide to materials in the California Language Archive},
url = {https://par.nsf.gov/biblio/10275773},
abstractNote = {Ticuna (ISO: tca) is a language isolate spoken in the northwestern Amazon Basin (Brazil, Colombia, Peru). Ticuna has more speakers than almost all other Indigenous Amazonian languages and – unlike most languages of the area – is still learned by children. Yet academic linguists have given it relatively little research attention. Therefore, to raise the profile of this areally important language, I offer a guide to three collections of Ticuna language materials held in the California Language Archive. These materials are extensive, including over 1,396 hours of recordings – primarily of child language and everyday conversations between adults – and 33 hours of transcriptions. To contextualize the materials, I provide background on the Ticuna language and people; the research projects which produced the materials; the participants who appear in them; and the ethical and permissions issues involved in collecting them. I then discuss the nature and scope of the materials, showing how the content of each collection motivated collection-specific choices about recording, transcription, organization in the archive, and metadata. Last, I outline how other researchers could draw on the collections for comparative analysis.},
journal = {Language documentation and conservation},
volume = {15},
author = {Skilton, Amalia},
editor = {null}
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.