skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: When classifying grammatical role, BERT doesn’t care about word order... except when it matters
Because meaning can often be inferred from lexical semantics alone, word order is often a redundant cue in natural language. For example, the words chopped, chef, and onion are more likely used to convey “The chef chopped the onion,” not “The onion chopped the chef.” Recent work has shown large language models to be surprisingly word order invariant, but crucially has largely considered natural prototypical inputs, where compositional meaning mostly matches lexical expectations. To overcome this confound, we probe grammatical role representation in English BERT and GPT-2, on instances where lexical expectations are not sufficient, and word order knowledge is necessary for correct classification. Such non-prototypical instances are naturally occurring English sentences with inanimate subjects or animate objects, or sentences where we systematically swap the arguments to make sentences like “The onion chopped the chef”. We find that, while early layer embeddings are largely lexical, word order is in fact crucial in defining the later-layer representations of words in semantically non-prototypical positions. Our experiments isolate the effect of word order on the contextualization process, and highlight how models use context in the uncommon, but critical, instances where it matters.  more » « less
Award ID(s):
1947307 2139005
PAR ID:
10336258
Author(s) / Creator(s):
; ;
Date Published:
Journal Name:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Page Range / eLocation ID:
636 to 643
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. In standard models of language production or comprehension, the elements which are retrieved from memory and combined into a syntactic structure are “lemmas” or “lexical items.” Such models implicitly take a “lexicalist” approach, which assumes that lexical items store meaning, syntax, and form together, that syntactic and lexical processes are distinct, and that syntactic structure does not extend below the word level. Across the last several decades, linguistic research examining a typologically diverse set of languages has provided strong evidence against this approach. These findings suggest that syntactic processes apply both above and below the “word” level, and that both meaning and form are partially determined by the syntactic context. This has significant implications for psychological and neurological models of language processing as well as for the way that we understand different types of aphasia and other language disorders. As a consequence of the lexicalist assumptions of these models, many kinds of sentences that speakers produce and comprehend—in a variety of languages, including English—are challenging for them to account for. Here we focus on language production as a case study. In order to move away from lexicalism in psycho- and neuro-linguistics, it is not enough to simply update the syntactic representations of words or phrases; the processing algorithms involved in language production are constrained by the lexicalist representations that they operate on, and thus also need to be reimagined. We provide an overview of the arguments against lexicalism, discuss how lexicalist assumptions are represented in models of language production, and examine the types of phenomena that they struggle to account for as a consequence. We also outline what a non-lexicalist alternative might look like, as a model that does not rely on a lemma representation, but instead represents that knowledge as separate mappings between (a) meaning and syntax and (b) syntax and form, with a single integrated stage for the retrieval and assembly of syntactic structure. By moving away from lexicalist assumptions, this kind of model provides better cross-linguistic coverage and aligns better with contemporary syntactic theory. 
    more » « less
  2. This study employed the N400 event-related potential (ERP) to investigate how observing different types of gestures at learning affects the subsequent processing of L2 Mandarin words differing in lexical tone by L1 English speakers. The effects of pitch gestures conveying lexical tones (e.g., upwards diagonal movements for rising tone), semantic gestures conveying word meanings (e.g., waving goodbye for to wave), and no gesture were compared. In a lexical tone discrimination task, larger N400s for Mandarin target words mismatching vs. matching Mandarin prime words in lexical tone were observed for words learned with pitch gesture. In a meaning discrimination task, larger N400s for English target words mismatching vs. matching Mandarin prime words in meaning were observed for words learned with pitch and semantic gesture. These findings provide the first neural evidence that observing gestures during L2 word learning enhances subsequent phonological and semantic processing of learned L2 words. 
    more » « less
  3. Goldwater, M; Angora, F; Hayes, B; Ong, D (Ed.)
    This study investigated how observing pitch gestures conveying lexical tones and representational gestures conveying word meanings when learning L2 Mandarin words differing in lexical tone affects their subsequent semantic and phonological processing in L1 English speakers using the N400 event-related potential (ERP). Larger N400s for English target words mismatching vs. matching Mandarin prime words in meaning were observed for words learned with pitch and representational gesture, but not no gesture. Additionally, larger N400s for Mandarin target words mismatching vs. matching Mandarin prime words in lexical tone were observed for words learned with pitch gesture, but not representational or no gesture. These findings provide the first ERP evidence that observing gestures conveying phonological and semantic information during L2 word learning enhances subsequent phonological and semantic processing of learned L2 words. 
    more » « less
  4. Abstract Limited language experience in childhood is common among deaf individuals, which prior research has shown to lead to low levels of language processing. Although basic structures such as word order have been found to be resilient to conditions of sparse language input in early life, whether they are robust to conditions of extreme language delay is unknown. The sentence comprehension strategies of post‐childhood, first‐language (L1) learners of American Sign Language (ASL) with at least 9 years of language experience were investigated, in comparison to two control groups of learners with full access to language from birth (deaf native signers and hearing L2 learners who were native English speakers). The results of a sentence‐to‐picture matching experiment show that event knowledge overrides word order for post‐childhood L1 learners, regardless of the animacy of the subject, while both deaf native signers and hearing L2 signers consistently rely on word order to comprehend sentences. Language inaccessibility throughout early childhood impedes the acquisition of even basic word order. Similar to the strategies used by very young children prior to the development of basic sentence structure, post‐childhood L1 learners rely more on context and event knowledge to comprehend sentences. Language experience during childhood is critical to the development of basic sentence structure. 
    more » « less
  5. What makes some words harder to learn than others in a second language? Although some robust factors have been identified based on small scale experimental studies, many relevant factors are difficult to study in such experiments due to the amount of data necessary to test them. Here, we investigate what factors affect the ease of learning of a word in a second language using a large data set of users learning English as a second language through the Duolingo mobile app. In a regression analysis, we test and confirm the well-studied effect of cognate status on word learning accuracy. Furthermore, we find significant effects for both cross-linguistic semantic alignment and English semantic density, two novel predictors derived from large scale distributional models of lexical semantics. Finally, we provide data on several other psycholinguistically plausible word level predictors. We conclude with a discussion of the limits, benefits and future research potential of using big data for investigating second language learning. 
    more » « less