ParallelAttentionMechanismsinNeuralMachine Translation

Medina, Julian; Kalita, Jugal

Citation Details

Recent papers in neural machine translation have proposed the strict use of attention mechanisms over previous stan- dards such as recurrent and convolutional neural networks (RNNs and CNNs). We propose that by running traditionally stacked encoding branches from encoder-decoder attention- focused architectures in parallel, that even more sequential operations can be removed from the model and thereby de- crease training time. In particular, we modify the recently published attention-based architecture called Transformer by Google, by replacing sequential attention modules with par- allel ones, reducing the amount of training time and substan- tially improving BLEU scores at the same time. Experiments over the English to German and English to French translation tasks show that our model establishes a new state of the art. more »

Award ID(s):: 1659788

PAR ID:: 10098857

Author(s) / Creator(s):: Medina, Julian; Kalita, Jugal

Date Published:: 2018-12-01

Journal Name:: International Conference on Machine Learning Applications

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
The DOI is not currently available.

More Like this