Categorial Grammar Induction with Stochastic Category Selection

Clark, Christian; Schuler, William

Citation Details

Grammar induction, the task of learning a set of syntactic rules from minimally annotated training data, provides a means of exploring the longstanding question of whether humans rely on innate knowledge to acquire language. Of the various formalisms available for grammar induction, categorial grammars provide an appealing option due to their transparent interface between syntax and semantics. However, to obtain competitive results, previous categorial grammar inducers have relied on shortcuts such as part-of-speech annotations or an ad hoc bias term in the objective function to ensure desirable branching behavior. We present a categorial grammar inducer that eliminates both shortcuts: it learns from raw data, and does not rely on a biased objective function. This improvement is achieved through a novel stochastic process used to select the set of available syntactic categories. On a corpus of English child-directed speech, the model attains a recall-homogeneity of 0.48, a large improvement over previous categorial grammar inducers. more »

Award ID(s):: 2313140

PAR ID:: 10537956

Author(s) / Creator(s):: Clark, Christian; Schuler, William

Publisher / Repository:: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

Date Published:: 2024-05-15

Page Range / eLocation ID:: 2893-2900

Format(s):: Medium: X Size: 667KB Other: pdf

Size(s):: 667KB

Location:: https://aclanthology.org/2024.lrec-main.258

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this