The stability of segmental properties across genre and corpus types in low-resource languages

Cohen Priva, Uriel; Yang, Shiying; Strand, Emily

doi:10.7275/fttf-fq95

Citation Details

The stability of segmental properties across genre and corpus types in low-resource languages

Are written corpora useful for phonological research? Word frequency lists for low-resource languages have become ubiquitous in recent years [@Crubadan]. For many languages there is direct correspondence between their written forms and their alphabets, but it is not clear whether written corpora can adequately represent language use. We use 15 low-resource languages and compare several information-theoretic properties across three corpus types. We show that despite differences in origin and genre, estimates in one corpus are highly correlated with estimates in other corpora. more »

Award ID(s):: 1829290

PAR ID:: 10158233

Author(s) / Creator(s):: Cohen Priva, Uriel; Yang, Shiying; Strand, Emily

Date Published:: 2020-01-01

Journal Name:: Proceedings of the Society for Computation in Linguistics

Volume:: 3

Page Range / eLocation ID:: 2

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.7275/fttf-fq95

More Like this