skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Variation in pronominal indexing: Lexical stipulation vs. referential properties in Alor-Pantar languages
We examine the role of referential properties and lexical stipulation in three closely related languages of eastern Indonesia, the Alor-Pantar languages Abui, Kamang, and Teiwa. Our focus is on the continuum along which event properties (e.g. volitionality, affectedness) are highly important at one extreme or play virtually no role at the other. These languages occupy different points along this continuum. In Abui, event semantics play the greatest role, while in Teiwa they play the smallest role (the lexical property animacy being dominant in the formation of verb classes). Kamang occupies an intermediate position. Teiwa has conventionalised the relation between a verb and its class along the lines of animacy so that classes become associated with the animacy value of the objects with which the verbs in a given class typically occur. Paying attention to a lexical property like animacy, in contrast with event properties, has meant greater potential for arbitrary classes to emerge.  more » « less
Award ID(s):
0936887
PAR ID:
10024417
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
Studies in Language
Volume:
38
Issue:
1
ISSN:
0378-4177
Page Range / eLocation ID:
44 to 79
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Semantic automata were developed to compare the complexity of generalized quantifiers in terms of the string languages that describe their truth conditions. An important point that has gone unnoticed so far is that these string languages are remarkably simple for most quantifiers, in particular those that can be realized by a single lexical item. Whereas complex quantifiers such as "an even number of" correspond to specific regular languages, the lexical quantifiers "every", "no", "some", as well as numerals do not reach this level of complexity. Instead, they all stay close to the bottom of the so-called subregular hierarchy. What more, the class of tier-based strictly local languages provide a remarkably tight characterization of the class of lexical quantifiers. A significant number of recent publications have also argued for the central role of tier-based strict locality in phonology, morphology, and syntax. This suggests that subregularity in general and tier-based strict locality in particular may be a unifying property of natural language across all its submodules. 
    more » « less
  2. We test predictions from the language emergent perspective on verbal working memory that lexico-syntactic constraints should support both item and order memory. In natural language, long-term knowledge of lexico-syntactic patterns involving part of speech, verb biases, and noun animacy support language comprehension and production. In three experiments, participants were presented with randomly generated dative-like sentences or lists in which part of speech, verb biases, and animacy of a single word were manipulated. Participants were more likely to recall words in the correct position when presented with a verb over a noun in the verb position, a good dative verb over an intransitive verb in the verb position, and an animate noun over an inanimate noun in the subject noun position. These results demonstrate that interactions between words and their context in the form of lexico-syntactic constraints influence verbal working memory. 
    more » « less
  3. Animacy is a necessary property for a referent to be an agent, and thus animacy detection is useful for a variety of natural language processing tasks, including word sense disambiguation, co-reference resolution, semantic role labeling, and others. Prior work treated animacy as a word-level property, and has developed statistical classifiers to classify words as either animate or inanimate. We discuss why this approach to the problem is ill-posed, and present a new approach based on classifying the animacy of co-reference chains. We show that simple voting approaches to inferring the animacy of a chain from its constituent words perform relatively poorly, and then present a hybrid system merging supervised machine learning (ML) and a small number of hand-built rules to compute the animacy of referring expressions and co-reference chains. This method achieves state of the art performance. The supervised ML component leverages features such as word embeddings over referring expressions, parts of speech, and grammatical and semantic roles. The rules take into consideration parts of speech and the hypernymy structure encoded in WordNet. The system achieves an F1 of 0.88 for classifying the animacy of referring expressions, which is comparable to state of the art results for classifying the animacy of words, and achieves an F1 of 0.75 for classifying the animacy of coreference chains themselves. We release our training and test dataset, which includes 142 texts (all narratives) comprising 156,154 words, 34,698 referring expressions, and 10,941 co-reference chains. We test the method on a subset of the OntoNotes dataset, showing using manual sampling that animacy classification is 90% +/- 2% accurate for coreference chains, and 92% +/- 1% for referring expressions. The data also contains 46 folktales, which present an interesting challenge because they often involve characters who are members of traditionally inanimate classes (e.g., stoves that walk, trees that talk). We show that our system is able to detect the animacy of these unusual referents with an F1 of 0.95. 
    more » « less
  4. Event concepts of common verbs (e.g. eat, sleep) can be broadly shared across languages, but a given language’s rules for subcategorization are largely arbitrary and vary substantially across languages. When subcategorization information does not match between first language (L1) and second language (L2), how does this mismatch impact L2 speakers in real time? We hypothesized that subcategorization knowledge in L1 is particularly difficult for L2 speakers to override online. Event-related potential (ERP) responses were recorded from English sentences that include verbs that were ambitransitive in Mandarin but intransitive in English (*  My sister listened the music). While L1 English speakers showed a prominent P600 effect to subcategorization violations, L2 English speakers whose L1 was Mandarin showed some sensitivity in offline responses but not in ERPs. This suggests that computing verb–argument relations, although seemingly one of the basic components of sentence comprehension, in fact requires accessing lexical syntax which may be vulnerable to L1 interference in L2. However, our exploratory analysis showed that more native-like behavioral accuracy was associated with a more native-like P600 effect, suggesting that, with enough experience, L2 speakers can ultimately overcome this interference. 
    more » « less
  5. The present article provides an overview of ongoing field-based research that deploys a variety of interactive experimental procedures in three strategically chosen bilingual contact environments, whose language dyads facilitate a partial separation of morphosyntactic factors in order to test the extent to which proposed grammatical constraints on intra-sentential code-switching are independent of language-specific factors. For purposes of illustration, the possibility of language switches between subject pronouns and verbs is compared for the three bilingual groups. The first scenario includes Ecuadoran Quichua and Media Lengua (entirely Quichua syntax and system morphology, all lexical roots replaced by Spanish items; both are null-subject languages). The second juxtaposes Spanish and the Afro-Colombian creole language Palenquero; the languages share highly cognate lexicons but differ substantially in grammatical structures (including null subjects in Spanish, only overt subjects in Palenquero). Spanish and Portuguese in north-eastern Argentina along the Brazilian border form the third focus: lexically and grammatically highly cognate languages that are nonetheless kept distinct by speakers (both null-subject languages, albeit with different usage patterns). Results from the three communities reveal a residual resistance against pronoun + verb switches irrespective of the subject-verb configuration, thereby motivating the application of similar techniques to other proposed grammatical constraints. 
    more » « less