skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Noun phrases in Kwéyòl Donmnik
Abstract Though Creole nominal systems have been intensely researched, in-context, corpus-based examinations are uncommon, and there are Creole languages whose noun phrases remain understudied. I use a corpus of conversational data and a pattern-building task designed to elicit demonstrative and definite noun phrases, exophoric reference, and co-speech pointing gestures to explore the noun phrase in Kwéyòl Donmnik, an endangered, understudied French lexifier Creole. I focus on noun phrases that are bare, marked by the post-nominal determiners definitela‘the’ or demonstrativesa-la‘this/that’, or accompanied by the pre-nominal indefinite determineryon‘a(n)’. Results pinpoint the readings conveyed by each noun phrase type, identify the word categories of their nouns, and address similarities in usage between definitelaand demonstrativesa-la.  more » « less
Award ID(s):
2126414 2126405
PAR ID:
10559589
Author(s) / Creator(s):
Publisher / Repository:
Benjamins
Date Published:
Journal Name:
Journal of Pidgin and Creole Languages
ISSN:
0920-9034
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. When a language offers multiple options for expressing the same meaning, what principles govern a speaker’s choice? Two well-known principles proposed for explaining wideranging speaker preference are Uniform Information Density and Availability-Based Production. Here we test the predictions of these theories in a previously uninvestigated case of speaker choice. Russian has two ways of expressing the comparative: an EXPLICIT option (Ona bystree chem ja/She fast- COMP than me-NOM) and a GENITIVE option (Ona bystree menya/She fast-COMP me-GEN). We lay out several potential predictions of each theory for speaker choice in the Russian comparative construction, including effects of postcomparative word predictability, phrase length, syntactic complexity, and semantic association between the comparative adjective and subsequent noun. In a corpus study, we find that the explicit construction is used preferentially when the postcomparative noun phrase is longer, has a relative clause, and is less semantically associated with the comparative adjective. A follow-up production experiment using visual scene stimuli to elicit comparative sentences replicates the corpus finding that Russian native speakers prefer the explicit form when post-comparative phrases are longer. These findings offer no clear support for the predictions of Uniform Information Density, but are broadly supportive of Availability- Based Production, with the explicit option serving as an unreduced form that eases speakers’ planning of complex or lowavailability utterances. Code for this study is available 
    more » « less
  2. null (Ed.)
    Identifying and understanding quality phrases from context is a fundamental task in text mining. The most challenging part of this task arguably lies in uncommon, emerging, and domain-specific phrases. The infrequent nature of these phrases significantly hurts the performance of phrase mining methods that rely on sufficient phrase occurrences in the input corpus. Context-aware tagging models, though not restricted by frequency, heavily rely on domain experts for either massive sentence-level gold labels or handcrafted gazetteers. In this work, we propose UCPhrase, a novel unsupervised context-aware quality phrase tagger. Specifically, we induce high-quality phrase spans as silver labels from consistently co-occurring word sequences within each document. Compared with typical context-agnostic distant supervision based on existing knowledge bases (KBs), our silver labels root deeply in the input domain and context, thus having unique advantages in preserving contextual completeness and capturing emerging, out-of-KB phrases. Training a conventional neural tagger based on silver labels usually faces the risk of overfitting phrase surface names. Alternatively, we observe that the contextualized attention maps generated from a Transformer-based neural language model effectively reveal the connections between words in a surface-agnostic way. Therefore, we pair such attention maps with the silver labels to train a lightweight span prediction model, which can be applied to new input to recognize (unseen) quality phrases regardless of their surface names or frequency. Thorough experiments on various tasks and datasets, including corpus-level phrase ranking, document-level keyphrase extraction, and sentence-level phrase tagging, demonstrate the superiority of our design over state-of-the-art pre-trained, unsupervised, and distantly supervised methods. 
    more » « less
  3. Abstract In Valdôtain Patois, an understudied Francoprovençal language spoken in Aosta Valley (Italy), wh-phrases can either be fronted or occur clause-internally. In this paper, I analyze the syntax and pragmatics of clause internal wh-phrases, showing that they require to be either activated in the preceding linguistic context or inferable. Based on evidence from word order and parasitic gaps, I argue that in Valdôtain Patois clause-internal wh-phrases are not in-situ, but move to an A′-position at the edge of vP. 
    more » « less
  4. Keyphrase generation aims to summarize long documents with a collection of salient phrases. Deep neural models have demonstrated remarkable success in this task, with the capability of predicting keyphrases that are even absent from a document. However, such abstractiveness is acquired at the expense of a substantial amount of annotated data. In this paper, we present a novel method for keyphrase generation, AutoKeyGen, without the supervision of any annotated doc-keyphrase pairs. Motivated by the observation that an absent keyphrase in a document may appear in other places, in whole or in part, we construct a phrase bank by pooling all phrases extracted from a corpus. With this phrase bank, we assign phrase candidates to new documents by a simple partial matching algorithm, and then we rank these candidates by their relevance to the document from both lexical and semantic perspectives. Moreover, we bootstrap a deep generative model using these top-ranked pseudo keyphrases to produce more absent candidates. Extensive experiments demonstrate that AutoKeyGen outperforms all unsupervised baselines and can even beat a strong supervised method in certain cases. 
    more » « less
  5. A word in natural language can be polysemous, having multiple meanings, as well as synonymous, meaning the same thing as other words. Word sense induction attempts to find the senses of pol- ysemous words. Synonymy detection attempts to find when two words are interchangeable. We com- bine these tasks, first inducing word senses and then detecting similar senses to form word-sense synonym sets (synsets) in an unsupervised fashion. Given pairs of images and text with noun phrase labels, we perform synset induction to produce col- lections of underlying concepts described by one or more noun phrases. We find that considering multi- modal features from both visual and textual context yields better induced synsets than using either con- text alone. Human evaluations show that our unsu- pervised, multi-modally induced synsets are com- parable in quality to annotation-assisted ImageNet synsets, achieving about 84% of ImageNet synsets’ approval. 
    more » « less