A Scalable Solution for Rule-Based Part-of-Speech Tagging on Novel Hardware Accelerators

Sadredini, Elaheh; Guo, Deyuan; Bo, Chunkun; Rahimi, Reza; Skadron, Kevin; Wang, Hongning

doi:10.1145/3219819.3219889

Citation Details

A Scalable Solution for Rule-Based Part-of-Speech Tagging on Novel Hardware Accelerators

Part-of-speech (POS) tagging is the foundation of many natural language processing applications. Rule-based POS tagging is a wellknown solution, which assigns tags to the words using a set of predefined rules. Many researchers favor statistical-based approaches over rule-based methods for better empirical accuracy. However, until now, the computational cost of rule-based POS tagging has made it difficult to study whether more complex rules or larger rulesets could lead to accuracy competitive with statistical approaches. In this paper, we leverage two hardware accelerators, the Automata Processor (AP) and Field Programmable Gate Arrays (FPGA), to accelerate rule-based POS tagging by converting rules to regular expressions and exploiting the highly-parallel regular-expressionmatching ability of these accelerators. We study the relationship between rule set size and accuracy, and observe that adding more rules only poses minimal overhead on the AP and FPGA. This allows a substantial increase in the number and complexity of rules, leading to accuracy improvement. Our experiments on Treebank and Brown corpora achieve up to 2,600X and 1,914X speedups on the AP and on the FPGA respectively over rule-based methods on the CPU in the rule-matching stage, up to 58× speedup over the Perceptron POS tagger on the CPU in total testing time, and up to 253× speedup over the LSTM tagger on the GPU in total testing time, while showing a competitive accuracy compared to neural-network and statistical solutions. more »

Award ID(s):: 1553568 1718216

PAR ID:: 10066046

Author(s) / Creator(s):: Sadredini, Elaheh; Guo, Deyuan; Bo, Chunkun; Rahimi, Reza; Skadron, Kevin; Wang, Hongning

Date Published:: 2018-08-18

Journal Name:: KDD '18 Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining

Page Range / eLocation ID:: 665 to 674

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3219819.3219889

More Like this