East Tukano languages are known for their developed nominal classification systems. Wa’ikhana (Piratapuyo) is in this sense a typical member of the family, since it has an open system with a large number of classes and with class markers which exercise derivational and agreement functions. Among all the Wa’ikhana inanimate classes, the class ‘round’ stands out for its semantic and morphosyntactic features. It is one of the most (if not the most) extensive classes, which includes round objects as well as objects of less prototypical shapes. Its markers in non-plural number have the biggest number of allomorphs, even though allomorphy of classifiers is not typical for this language. Besides, the class ‘round’ has a distinct plural marker, another feature absent from most classifiers. Comparison between Wa’ikhana and other related languages demonstrates that these peculiarities are shared by many East Tukano languages. Thus, the present paper aims to describe the class ‘round’ in Wa’ikhana and other languages of the family, and to show their common features as well as the features that distinguish Wa’ikhana.
more »
« less
Deep Subjecthood: Higher-Order Grammatical Features in Multilingual BERT
We investigate how Multilingual BERT (mBERT) encodes grammar by examining how the high-order grammatical feature of morphosyntactic alignment (how different languages define what counts as a “subject”) is manifested across the embedding spaces of different languages. To understand if and how morphosyntactic alignment affects contextual embedding spaces, we train classifiers to recover the subjecthood of mBERT embeddings in transitive sentences (which do not contain overt information about morphosyntactic alignment) and then evaluate them zero-shot on intransitive sentences (where subjecthood classification depends on alignment), within and across languages. We find that the resulting classifier distributions reflect the morphosyntactic alignment of their training languages. Our results demonstrate that mBERT representations are influenced by high-level grammatical features that are not manifested in any one input sentence, and that this is robust across languages. Further examining the characteristics that our classifiers rely on, we find that features such as passive voice, animacy and case strongly correlate with classification decisions, suggesting that mBERT does not encode subjecthood purely syntactically, but that subjecthood embedding is continuous and dependent on semantic and discourse factors, as is proposed in much of the functional linguistics literature. Together, these results provide insight into how grammatical features manifest in contextual embedding spaces, at a level of abstraction not covered by previous work.
more »
« less
- Award ID(s):
- 1947307
- PAR ID:
- 10259961
- Date Published:
- Journal Name:
- Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper investigates the ability of artificial neural networks to judge the grammatical acceptability of a sentence, with the goal of testing their linguistic competence. We introduce the Corpus of Linguistic Acceptability (CoLA), a set of 10,657 English sentences labeled as grammatical or ungrammatical from published linguistics literature. As baselines, we train several recurrent neural network models on acceptability classification, and find that our models outperform unsupervised models by Lau et al. (2016) on CoLA. Error-analysis on specific grammatical phenomena reveals that both Lau et al.’s models and ours learn systematic generalizations like subject-verb-object order. However, all models we test perform far below human level on a wide range of grammatical constructions.more » « less
-
Over the last decade, there has been a slow but steady accumulation of psycholinguistic research focusing on typologically diverse languages. In this review, we provide an overview of the psycholinguistic research on Philippine languages at the sentence level. We first discuss the grammatical features of these languages that figure prominently in existing research. We identify four linguistic domains that have received attention from language researchers and summarize the empirical terrain. We advance two claims that emerge across these different domains: ( a) The agent-first pressure plays a central role in many of the findings, and ( b) the generalization that the patient argument is the syntactically privileged argument cannot be reduced to frequency, but instead is an emergent phenomenon caused by the alignment of competing pressures toward an optimal candidate. We connect these language-specific claims to language-general theories of sentence processing.more » « less
-
Age of sign language acquisition has lifelong effect on syntactic preferences in sign language usersnull (Ed.)Acquisition of natural language has been shown to fundamentally impact both one’s ability to use the first language and the ability to learn subsequent languages later in life. Sign languages offer a unique perspective on this issue because Deaf signers receive access to signed input at varying ages. The majority acquires sign language in (early) childhood, but some learn sign language later—a situation that is drastically different from that of spoken language acquisition. To investigate the effect of age of sign language acquisition and its potential interplay with age in signers, we examined grammatical acceptability ratings and reaction time measures in a group of Deaf signers (age range = 28–58 years) with early (0–3 years) or later (4–7 years) acquisition of sign language in childhood. Behavioral responses to grammatical word order variations (subject–object–verb [SOV] vs. object–subject–verb [OSV]) were examined in sentences that included (1) simple sentences, (2) topicalized sentences, and (3) sentences involving manual classifier constructions, uniquely characteristic of sign languages. Overall, older participants responded more slowly. Age of acquisition had subtle effects on acceptability ratings, whereby the direction of the effect depended on the specific linguistic structure.more » « less
-
Cross-lingual transfer learning has become an important weapon to battle the unavailability of annotated resources for low-resource languages. One of the fundamental techniques to transfer across languages is learning language-agnostic representations, in the form of word embeddings or contextual encodings. In this work, we propose to leverage unannotated sentences from auxiliary languages to help learning language-agnostic representations. Specifically, we explore adversarial training for learning contextual encoders that produce invariant representations across languages to facilitate cross-lingual transfer. We conduct experiments on cross-lingual dependency parsing where we train a dependency parser on a source language and transfer it to a wide range of target languages. Experiments on 28 target languages demonstrate that adversarial training significantly improves the overall transfer performances under several different settings. We conduct a careful analysis to evaluate the language-agnostic representations resulted from adversarial training.more » « less
An official website of the United States government

