- Award ID(s):
- 1761548
- PAR ID:
- 10105105
- Date Published:
- Journal Name:
- Proceedings of the 2nd Workshop on Analyzing and interpreting neural networks for NLP
- Volume:
- 2
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
Neural Machine Translation (NMT) systems are known to degrade when confronted with noisy data, especially when the system is trained only on clean data. In this paper, we show that augmenting training data with sentences containing artificially-introduced grammatical errors can make the system more robust to such errors. In combination with an automatic grammar error correction system, we can recover 1.0 BLEU out of 2.4 BLEU lost due to grammatical errors. We also present a set of Spanish translations of the JFLEG grammar error correction corpus, which allows for testing NMT robustness to real grammatical errors.more » « less
-
null (Ed.)Neural Machine Translation (NMT) performs training of a neural network employing an encoder-decoder architecture. However, the quality of the neural-based translations predominantly depends on the availability of a large amount of bilingual training dataset. In this paper, we explore the performance of translations predicted by attention-based NMT systems for Spanish to Persian low-resource language pairs. We analyze the errors of NMT systems that occur in the Persian language and provide an in-depth comparison of the performance of the system based on variations in sentence length and size of the training dataset. We evaluate our translation results using BLEU and human evaluation measures based on the adequacy, fluency, and overall rating.more » « less
-
Abstract Lysine fatty acylation in mammalian cells was discovered nearly three decades ago, yet the enzymes catalyzing it remain unknown. Unexpectedly, we find that human N-terminal glycine myristoyltransferases (NMT) 1 and 2 can efficiently myristoylate specific lysine residues. They modify ADP-ribosylation factor 6 (ARF6) on lysine 3 allowing it to remain on membranes during the GTPase cycle. We demonstrate that the NAD+-dependent deacylase SIRT2 removes the myristoyl group, and our evidence suggests that NMT prefers the GTP-bound while SIRT2 prefers the GDP-bound ARF6. This allows the lysine myrisotylation-demyristoylation cycle to couple to and promote the GTPase cycle of ARF6. Our study provides an explanation for the puzzling dissimilarity of ARF6 to other ARFs and suggests the existence of other substrates regulated by this previously unknown function of NMT. Furthermore, we identified a NMT/SIRT2-ARF6 regulatory axis, which may offer new ways to treat human diseases.
-
null (Ed.)We introduce The Benchmark of Linguistic Minimal Pairs (BLiMP), 1 a challenge set for evaluating the linguistic knowledge of language models (LMs) on major grammatical phenomena in English. BLiMP consists of 67 individual datasets, each containing 1,000 minimal pairs—that is, pairs of minimally different sentences that contrast in grammatical acceptability and isolate specific phenomenon in syntax, morphology, or semantics. We generate the data according to linguist-crafted grammar templates, and human aggregate agreement with the labels is 96.4%. We evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs by observing whether they assign a higher probability to the acceptable sentence in each minimal pair. We find that state-of-the-art models identify morphological contrasts related to agreement reliably, but they struggle with some subtle semantic and syntactic phenomena, such as negative polarity items and extraction islands.more » « less
-
Synthetic translations have been used for a wide range of NLP tasks primarily as a means of data augmentation. This work explores, instead, how synthetic translations can be used to revise potentially imperfect reference translations in mined bitext. We find that synthetic samples can improve bitext quality without any additional bilingual supervision when they replace the originals based on a semantic equivalence classifier that helps mitigate NMT noise. The improved quality of the revised bitext is confirmed intrinsically via human evaluation and extrinsically through bilingual induction and MT tasks.more » « less