skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Curbing Feature Coding: Strictly Local Feature Assignment
Graf (2017) warns that every syntactic formalism faces a severe overgeneration problem because of the hidden power of subcategorization. Any constraint definable in monadic second-order logic can be compiled into the category system so that it is indirectly enforced as part of subcategorization. Not only does this kind of feature coding deprive syntactic proposals of their empirical bite, it also undermines computational efforts to limit syntactic formalisms via subregular complexity. This paper presents a subregular solution to feature coding. Instead of features being a cheap resource that comes for free, features must be assigned by a transduction. In particular, category features must be assigned by an input strictly local (ISL) tree-tot-tree transduction, defined here for the first time. The restriction to ISL transductions correctly rules out various deviant category systems.  more » « less
Award ID(s):
1845344
PAR ID:
10140839
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of the Society for Computation in Linguistics
Volume:
3
Page Range / eLocation ID:
362-371
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Extending prior work in Graf (2018, 2020, 2022c), I show that movement is tier-based strictly local (TSL) even if one analyzes it as a transformation, i.e. a tree transduction from derivation trees to output trees. I define input strictly local (ISL) tree-to-tree transductions with (lexical) TSL tests as a tier-based extension of ISL tree-to-tree transductions. TSL tests allow us to attach each mover to all its landing sites. In general, this class of transductions fails to attach each mover to its final landing site to the exclusion of all its intermediate landing sites, which is crucial for producing output trees with the correct string yield. The problem is avoided, though, if syntax enforces a variant of the Ban on Improper Movement. Subregular complexity thus provides a novel motivation for core restrictions on movement while also shedding new light on the choice between copies and traces in syntax. 
    more » « less
  2. We use the MG treebank of Torr (2017) to investigate the conjecture in Graf (2020) that category systems are ISL-2 inferrable. A category system is ISL-2 inferrable iff the category feature of every lexical item can be jointly inferred from phonological exponents of both the item itself and either its selecting head or the arguments it selects. If correct, this conjecture would greatly limit the overgeneration problem posed by subcategorization mechanisms. Our corpus study finds that the conjecture is largely borne out, with only a few exceptions attested in the corpus. However, we also observe that it holds even for features that aren't expected to be inferrable in this manner, and we demonstrate that inferrability can arise merely from language datasets displaying Zipfian distributions. We conclude that category systems in natural languages may well be ISL-2 inferrable, but that this could be due to extragrammatical factors. 
    more » « less
  3. Building on recent work in subregular syntax, we argue that syntactic constraints are best understood as operating not over trees, but rather strings that track structural relations such as dominance and c-command. Even constraints that seem intrinsically tied to trees (e.g. constraints on tree tiers) can be reduced to such strings. We define serial constraints as an abstraction that decomposes string constraints into a context function (which associates nodes with strings) and a requirement function (which enforces constraints on these strings). We provide a general procedure for implementing serial constraints as deterministic tree automata. The construction reveals that the many types of constraints found in subregular syntax are variants of the same computational template. Our findings open up a string-based perspective on syntactic constraints and provide a new, very general approach to the automata-theoretic study of subregular complexity. 
    more » « less
  4. Contrastive self-supervised learning has been successfully used in many domains, such as images, texts, graphs, etc., to learn features without requiring label information. In this paper, we propose a new local contrastive feature learning (LoCL) framework, and our theme is to learn local patterns/features from tabular data. In order to create a niche for local learning, we use feature correlations to create a maximum-spanning tree, and break the tree into feature subsets, with strongly correlated features being assigned next to each other. Convolutional learning of the features is used to learn latent feature space, regulated by contrastive and reconstruction losses. Experiments on public tabular datasets show the effectiveness of the proposed method versus state-of-the-art baseline methods. 
    more » « less
  5. Recent work in subregular syntax has revealed deep parallels among syntactic phenomena, many of which fall under the computational class TSL (Graf, 2018, 2022). Vu et al. (2019) argue that case dependencies are yet another member of this class. But their analysis focuses mainly on English, which is famously case-poor. In this paper I present a TSL analysis of Japanese, which features a much wider range of case-marking patterns, adding support to the claim that case dependencies, and by extension syntactic dependencies, are TSL. 
    more » « less