Unsupervised Parsing with S-DIORA: Single Tree Encoding for Deep Inside-Outside Recursive Autoencoders

Drozdov, Andrew; Rongali, Subendhu; Chen, Yi-Pei; O’Gorman, Tim; Iyyer, Mohit; McCallum, Andrew

doi:10.18653/v1/2020.emnlp-main.392

Citation Details

Unsupervised Parsing with S-DIORA: Single Tree Encoding for Deep Inside-Outside Recursive Autoencoders

The deep inside-outside recursive autoencoder (DIORA; Drozdov et al. 2019) is a self-supervised neural model that learns to induce syntactic tree structures for input sentences *without access to labeled training data*. In this paper, we discover that while DIORA exhaustively encodes all possible binary trees of a sentence with a soft dynamic program, its vector averaging approach is locally greedy and cannot recover from errors when computing the highest scoring parse tree in bottom-up chart parsing. To fix this issue, we introduce S-DIORA, an improved variant of DIORA that encodes a single tree rather than a softly-weighted mixture of trees by employing a hard argmax operation and a beam at each cell in the chart. Our experiments show that through *fine-tuning* a pre-trained DIORA with our new algorithm, we improve the state of the art in *unsupervised* constituency parsing on the English WSJ Penn Treebank by 2.2-6% F1, depending on the data used for fine-tuning. more »

Award ID(s):: 1955567

PAR ID:: 10254046

Author(s) / Creator(s):: Drozdov, Andrew; Rongali, Subendhu; Chen, Yi-Pei; O’Gorman, Tim; Iyyer, Mohit; McCallum, Andrew

Date Published:: 2020-11-01

Journal Name:: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Page Range / eLocation ID:: 4832 to 4845

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2020.emnlp-main.392

More Like this