AutoQA: From Databases To QA Semantic Parsers With Only Synthetic Training Data

Xu, Silei; Semnani, Sina; Campagna, Giovanni; Lam, Monica

doi:10.18653/v1/2020.emnlp-main.31

Citation Details

AutoQA: From Databases To QA Semantic Parsers With Only Synthetic Training Data

We propose AutoQA, a methodology and toolkit to generate semantic parsers that answer questions on databases, with no manual effort. Given a database schema and its data, AutoQA automatically generates a large set of high-quality questions for training that covers different database operations. It uses automatic paraphrasing combined with template-based parsing to find alternative expressions of an attribute in different parts of speech. It also uses a novel filtered auto-paraphraser to generate correct paraphrases of entire sentences. We apply AutoQA to the Schema2QA dataset and obtain an average logical form accuracy of 62.9% when tested on natural questions, which is only 6.4% lower than a model trained with expert natural language annotations and paraphrase data collected from crowdworkers. To demonstrate the generality of AutoQA, we also apply it to the Overnight dataset. AutoQA achieves 69.8% answer accuracy, 16.4% higher than the state-of-the-art zero-shot models and only 5.2% lower than the same model trained with human data. more »

Award ID(s):: 1900638

NSF-PAR ID:: 10211958

Author(s) / Creator(s):: Xu, Silei; Semnani, Sina; Campagna, Giovanni; Lam, Monica

Date Published:: 2020-11-01

Journal Name:: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Page Range / eLocation ID:: 422 to 434

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.18653/v1/2020.emnlp-main.31

More Like this