skip to main content

This content will become publicly available on January 5, 2023

Title: Intensional Gaps: Relating veridicality, factivity, doxasticity, bouleticity, and neg-raising
We investigate which patterns of lexically triggered doxastic, bouletic, neg(ation)-raising, and veridicality inferences are (un)attested across clause-embedding verbs in English. To carry out this investigation, we use a multiview mixed effects mixture model to discover the inference patterns captured in three lexicon-scale inference judgment datasets: two existing datasets, MegaVeridicality and MegaNegRaising, which capture veridicality and neg-raising inferences across a wide swath of the English clause-embedding lexicon, and a new dataset, MegaIntensionality, which similarly captures doxastic and bouletic inferences. We focus in particular on inference patterns that are correlated with morphosyntactic distribution, as determined by how well those patterns predict the acceptability judgments in the MegaAcceptability dataset. We find that there are 15 such patterns attested. Similarities among these patterns suggest the possibility of underlying lexical semantic components that give rise to them. We use principal component analysis to discover these components and suggest generalizations that can be derived from them.
Authors:
; ;
Award ID(s):
1748969
Publication Date:
NSF-PAR ID:
10333437
Journal Name:
Semantics and Linguistic Theory
Volume:
31
Page Range or eLocation-ID:
570; 605
ISSN:
2163-5951
Sponsoring Org:
National Science Foundation
More Like this
  1. We investigate which patterns of lexically triggered doxastic, bouletic, neg(ation)-raising, and veridicality inferences are (un)attested across clause-embedding verbs in English. To carry out this investigation, we use a multiview mixed effects mixture model to discover the inference patterns captured in three lexicon-scale inference judgment datasets: two existing datasets, MegaVeridicality and MegaNegRaising, which capture veridicality and neg-raising inferences across a wide swath of the English clause-embedding lexicon, and a new dataset, MegaIntensionality, which similarly captures doxastic and bouletic inferences. We focus in particular on inference patterns that are correlated with morphosyntactic distribution, as determined by how well those patterns predict themore »acceptability judgments in the MegaAcceptability dataset. We find that there are 15 such patterns attested. Similarities among these patterns suggest the possibility of underlying lexical semantic components that give rise to them. We use principal component analysis to discover these components and suggest generalizations that can be derived from them.« less
  2. We investigate neg(ation)-raising inferences, wherein negation on a predicate can be interpreted as though in that predicate’s subordinate clause. To do this, we collect a largescale dataset of neg-raising judgments for effectively all English clause-embedding verbs and develop a model to jointly induce the semantic types of verbs and their subordinate clauses and the relationship of these types to neg-raising inferences. We find that some neg-raising inferences are attributable to properties of particular predicates, while others are attributable to subordinate clause structure.
  3. We investigate neural models’ ability to capture lexicosyntactic inferences: inferences triggered by the interaction of lexical and syntactic information. We take the task of event factuality prediction as a case study and build a factuality judgment dataset for all English clause-embedding verbs in various syntactic contexts. We use this dataset, which we make publicly available, to probe the behavior of current state-of-the-art neural systems, showing that these systems make certain systematic errors that are clearly visible through the lens of factuality prediction.
  4. Abstract

    The world is facing a crisis of language loss that rivals, or exceeds, the rate of loss of biodiversity. There is an increasing urgency to understand the drivers of language change in order to try and stem the catastrophic rate of language loss globally and to improve language vitality. Here we present a unique case study of language shift in an endangered Indigenous language, with a dataset of unprecedented scale. We employ a novel multidimensional analysis, which allows the strength of a quantitative approach without sacrificing the detail of individual speakers and specific language variables, to identify social, cultural,more »and demographic factors that influence language shift in this community. We develop the concept of the ‘linguatype’, a sample of an individual’s language variants, analogous to the geneticists’ concept of ‘genotype’ as a sample of an individual’s genetic variants. We use multidimensional clustering to show that while family and household have significant effects on language patterns, peer group is the most significant factor for predicting language variation. Generalized linear models demonstrate that the strongest factor promoting individual use of the Indigenous language is living with members of the older generation who speak the heritage language fluently. Wright–Fisher analysis indicates that production of heritage language is lost at a significantly faster rate than perception, but there is no significant difference in rate of loss of verbs vs nouns, or lexicon vs grammar. Notably, we show that formal education has a negative relationship with Indigenous language retention in this community, with decreased use of the Indigenous language significantly associated with more years of monolingual schooling in English. These results suggest practical strategies for strengthening Indigenous language retention and demonstrate a new analytical approach to identifying risk factors for language loss in Indigenous communities that may be applicable to many languages globally.

    « less
  5. Natural language inference (NLI) datasets (e.g., MultiNLI) were collected by soliciting hypotheses for a given premise from annotators. Such data collection led to annotation artifacts: systems can identify the premise-hypothesis relationship without observing the premise (e.g., negation in hypothesis being indicative of contradiction). We address this problem by recasting the CommitmentBank for NLI, which contains items involving reasoning over the extent to which a speaker is committed to complements of clause-embedding verbs under entailment-canceling environments (conditional, negation, modal and question). Instead of being constructed to stand in certain relationships with the premise, hypotheses in the recast CommitmentBank are the complementsmore »of the clause-embedding verb in each premise, leading to no annotation artifacts in the hypothesis. A state-of-the-art BERT-based model performs well on the CommitmentBank with 85% F1. However analysis of model behavior shows that the BERT models still do not capture the full complexity of pragmatic reasoning, nor encode some of the linguistic generalizations, highlighting room for improvement.« less