We generalize Cohen, Gómez-Rodríguez, and Satta’s (2011) parser to a family of non-projective transition-based dependency parsers allowing polynomial-time exact inference. This includes novel parsers with better coverage than Cohen et al. (2011), and even a variant that reduces time complexity to O(n^6), improving on prior bounds. We hope that this piece of theoretical work inspires design of novel transition systems with better coverage and better run-time guarantees.
more »
« less
Global Transition-based Non-projective Dependency Parsing
Shi, Huang, and Lee (2017) obtained state-of-the-art results for English and Chinese dependency parsing by combining dynamic-programming implementations of transition-based dependency parsers with a minimal set of bidirectional LSTM features. However, their results were limited to projective parsing. In this paper, we extend their approach to support non-projectivity by providing the first practical implementation of the MH4 algorithm, an O(n^4) mildly nonprojective dynamic-programming parser with very high coverage on non-projective treebanks. To make MH4 compatible with minimal transition-based feature sets, we introduce a transition-based interpretation of it in which parser items are mapped to sequences of transitions. We thus obtain the first implementation of global decoding for non-projective transition-based parsing, and demonstrate empirically that it is more effective than its projective counterpart in parsing a number of highly non-projective languages.
more »
« less
- Award ID(s):
- 1741441
- PAR ID:
- 10075231
- Date Published:
- Journal Name:
- Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics
- Volume:
- 1 (Long papers)
- Page Range / eLocation ID:
- 2664--2675
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Automatic discourse processing is bottlenecked by data: current discourse formalisms pose highly demanding annotation tasks involving large taxonomies of discourse relations, making them inaccessible to lay annotators. This work instead adopts the linguistic framework of Questions Under Discussion (QUD) for discourse analysis and seeks to derive QUD structures automatically. QUD views each sentence as an answer to a question triggered in prior context; thus, we characterize relationships between sentences as free-form questions, in contrast to exhaustive fine-grained taxonomies. We develop the first-of-its-kind QUD parser that derives a dependency structure of questions over full documents, trained using a large, crowdsourced question-answering dataset DCQA (Ko et al., 2022). Human evaluation results show that QUD dependency parsing is possible for language models trained with this crowdsourced, generalizable annotation scheme. We illustrate how our QUD structure is distinct from RST trees, and demonstrate the utility of QUD analysis in the context of document simplification. Our findings show that QUD parsing is an appealing alternative for automatic discourse processing.more » « less
-
Computing devices have recently become capable of interacting with their end users via natural language. However, they can only operate within a limited “supported” domain of discourse and fail drastically when faced with an out-of-domain utterance, mainly due to the limitations of their semantic parser. In this paper, we propose a semantic parser that generalizes to out-of-domain examples by learning a general strategy for parsing an unseen utterance through adapting the logical forms of seen utterances, instead of learning to generate a logical form from scratch. Our parser maintains a memory consisting of a representative subset of the seen utterances paired with their logical forms. Given an unseen utterance, our parser works by looking up a similar utterance from the memory and adapting its logical form until it fits the unseen utterance. Moreover, we present a data generation strategy for constructing utterance-logical form pairs from different domains. Our results show an improvement of up to 68.8% on one-shot parsing under two different evaluation settings compared to the baselines.more » « less
-
null (Ed.)“Episodic Logic: Unscoped Logical Form” (EL-ULF) is a semantic representation capturing predicate-argument structure as well as more challenging aspects of language within the Episodic Logic formalism. We present the first learned approach for parsing sentences into ULFs, using a growing set of annotated examples. The results provide a strong baseline for future improvement. Our method learns a sequence-to-sequence model for predicting the transition action sequence within a modified cache transition system. We evaluate the efficacy of type grammar-based constraints, a word-to-symbol lexicon, and transition system state features in this task. Our system is availableat https://github.com/genelkim/ ulf-transition-parser. We also present the first official annotated ULF dataset at https://www.cs.rochester.edu/u/ gkim21/ulf/resources/.more » « less
-
In a programmer's pursuit of using or creating new programming languages, finding errors in the syntax of code can present many issues. Languages with little to no documentation and incomprehensible exception handling and reports are frustrating to work with and can create confusion when trying to locate where in the code the program has faulted. In this paper we present {\em CodeBlock}, a parser generator and syntax checker for arbitrary programming languages. CodeBlock is a block based grammar builder for any programming language that constructs a parsing expression grammar for the language based on user built expressions. This grammar can then be used within the CodeBlock website or in the CodeBlock Node.JS application to test the syntax of either written code, or files containing code in the language, reporting comprehensible error messages if errors in syntax are found. Our eventual goal is to incorporate CodeBlock into a compiler design tutoring system, called {\em CompiTS}, in which it will play a central role in teaching students how to design new programming languages and test the effectiveness of the new language using rapid prototyping and a translational approach to implementation. This is an emerging research, and in this paper, we only focus on the syntax checking component of the CompiTS system.more » « less
An official website of the United States government

