skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: A Cross-Linguistic Pressure for Uniform Information Density in Word Order
Abstract While natural languages differ widely in both canonical word order and word order flexibility, their word orders still follow shared cross-linguistic statistical patterns, often attributed to functional pressures. In the effort to identify these pressures, prior work has compared real and counterfactual word orders. Yet one functional pressure has been overlooked in such investigations: The uniform information density (UID) hypothesis, which holds that information should be spread evenly throughout an utterance. Here, we ask whether a pressure for UID may have influenced word order patterns cross-linguistically. To this end, we use computational models to test whether real orders lead to greater information uniformity than counterfactual orders. In our empirical study of 10 typologically diverse languages, we find that: (i) among SVO languages, real word orders consistently have greater uniformity than reverse word orders, and (ii) only linguistically implausible counterfactual orders consistently exceed the uniformity of real orders. These findings are compatible with a pressure for information uniformity in the development and usage of natural languages.1  more » « less
Award ID(s):
2121074
PAR ID:
10488152
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
MIT Press
Date Published:
Journal Name:
Transactions of the Association for Computational Linguistics
Volume:
11
ISSN:
2307-387X
Page Range / eLocation ID:
1048 to 1065
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Different languages might have different word orders. In this paper, we investigate crosslingual transfer and posit that an orderagnostic model will perform better when transferring to distant foreign languages. To test our hypothesis, we train dependency parsers on an English corpus and evaluate their transfer performance on 30 other languages. Specifically, we compare encoders and decoders based on Recurrent Neural Networks (RNNs) and modified self-attentive architectures. The former relies on sequential information while the latter is more flexible at modeling word order. Rigorous experiments and detailed analysis shows that RNN-based architectures transfer well to languages that are close to English, while self-attentive models have better overall cross-lingual transferability and perform especially well on distant languages. 
    more » « less
  2. Co-speech gestures are timed to occur with prosodically prominent syllables in several languages. In prior work in Indo-European languages, gestures are found to be attracted to stressed syllables, with gesture apexes preferentially aligning with syllables bearing higher and more dynamic pitch accents. Little research has examined the temporal alignment of co-speech gestures in African tonal languages, where metrical prominence is often hard to identify due to a lack of canonical stress correlates, and where a key function of pitch is in distinguishing between words, rather than marking intonational prominence. Here, we examine the alignment of co-speech gestures in two different Niger-Congo languages with very different word structures, Medʉmba (Grassfields Bantu, Cameroon) and Igbo (Igboid, Nigeria). Our findings suggest that the initial position in the stem tends to attract gestures in Medʉmba, while the final syllable in the word is the default position for gesture alignment in Igbo; phrase position also influences gesture alignment, but in language-specific ways. Though neither language showed strong evidence of elevated prominence of any individual tone value, gesture patterning in Igbo suggests that metrical structure at the level of the tonal foot is relevant to the speech-gesture relationship. Our results demonstrate how the speech-gesture relationship can be a window into patterns of word- and phrase-level prosody cross-linguistically. They also show that the relationship between gesture and tone (and the related notion of ‘tonal prominence’) is mediated by tone’s function in a language.  
    more » « less
  3. Serikov, Oleg; Voloshina, Ekaterina; Postnikova, Anna; Klyachko, Elena; Neminova, Ekaterina; Vylomova, Ekaterina; Shavrina, Tatiana; Le Ferrand, Eric; Malykh, Valentin; Tyers, Francis (Ed.)
    In this paper, we present a straightforward technique for constructing interpretable word embeddings from morphologically analyzed examples (such as interlinear glosses) for all of the world’s languages. Currently, fewer than 300-400 languages out of approximately 7000 have have more than a trivial amount of digitized texts; of those, between 100-200 languages (most in the Indo-European language family) have enough text data for BERT embeddings of reasonable quality to be trained. The word embeddings in this paper are explicitly designed to be both linguistically interpretable and fully capable of handling the broad variety found in the world’s diverse set of 7000 languages, regardless of corpus size or morphological characteristics. We demonstrate the applicability of our representation through examples drawn from a typologically diverse set of languages whose morphology includes prefixes, suffixes, infixes, circumfixes, templatic morphemes, derivational morphemes, inflectional morphemes, and reduplication. 
    more » « less
  4. Abstract What makes a word easy to learn? Early‐learned words are frequent and tend to name concrete referents. But words typically do not occur in isolation. Some words are predictable from their contexts; others are less so. Here, we investigate whether predictability relates to when children start producing different words (age of acquisition; AoA). We operationalized predictability in terms of a word's surprisal in child‐directed speech, computed using n‐gram and long‐short‐term‐memory (LSTM) language models. Predictability derived from LSTMs was generally a better predictor than predictability derived from n‐gram models. Across five languages, average surprisal was positively correlated with the AoA of predicates and function words but not nouns. Controlling for concreteness and word frequency, more predictable predicates and function words were learned earlier. Differences in predictability between languages were associated with cross‐linguistic differences in AoA: the same word (when it was a predicate) was produced earlier in languages where the word was more predictable. 
    more » « less
  5. null (Ed.)
    Crookes radiometers have been the subject of numerous theoretical, numerical, and experimental studies because of the complicated forces they exhibit as well as their potential applications to light sensing and actuation. The majority of these studies have focused on classical radiometers, which function under low vacuum pressures. In contrast, here we report a radiometer with microengineered vanes that rotates at atmospheric pressure. Its functionality at pressures thousands of times higher than previous light mills is due to unique attributes of the nanocardboard that forms its vanes: 1) the extremely low areal density (0.1 mg/cm 2 ) of nanocardboard reduces the vane masses by two orders of magnitude; 2) its lower thermal conductivity allows a greater cross-vane temperature difference; and 3) its microchannels dramatically increase the thermal transpiration flow that drives the rotation. Intriguingly, the experimentally observed rotation speeds are substantially higher than those theoretically expected. Our device demonstrates new possibilities for micromanipulation, propulsion of aerial vehicles, and light-powered generators. 
    more » « less