skip to main content


Title: An Analysis of Source-Side Grammatical Errors in NMT
The quality of Neural Machine Translation (NMT) has been shown to significantly degrade when confronted with source-side noise. We present the first large-scale study of stateof-the-art English-to-German NMT on real grammatical noise, by evaluating on several Grammar Correction corpora. We present methods for evaluating NMT robustness without true references, and we use them for extensive analysis of the effects that different grammatical errors have on the NMT output. We also introduce a technique for visualizing the divergence distribution caused by a sourceside error, which allows for additional insights.  more » « less
Award ID(s):
1761548
NSF-PAR ID:
10105105
Author(s) / Creator(s):
Date Published:
Journal Name:
Proceedings of the 2nd Workshop on Analyzing and interpreting neural networks for NLP
Volume:
2
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Neural Machine Translation (NMT) systems are known to degrade when confronted with noisy data, especially when the system is trained only on clean data. In this paper, we show that augmenting training data with sentences containing artificially-introduced grammatical errors can make the system more robust to such errors. In combination with an automatic grammar error correction system, we can recover 1.0 BLEU out of 2.4 BLEU lost due to grammatical errors. We also present a set of Spanish translations of the JFLEG grammar error correction corpus, which allows for testing NMT robustness to real grammatical errors. 
    more » « less
  2. null (Ed.)
    We introduce The Benchmark of Linguistic Minimal Pairs (BLiMP), 1 a challenge set for evaluating the linguistic knowledge of language models (LMs) on major grammatical phenomena in English. BLiMP consists of 67 individual datasets, each containing 1,000 minimal pairs—that is, pairs of minimally different sentences that contrast in grammatical acceptability and isolate specific phenomenon in syntax, morphology, or semantics. We generate the data according to linguist-crafted grammar templates, and human aggregate agreement with the labels is 96.4%. We evaluate n-gram, LSTM, and Transformer (GPT-2 and Transformer-XL) LMs by observing whether they assign a higher probability to the acceptable sentence in each minimal pair. We find that state-of-the-art models identify morphological contrasts related to agreement reliably, but they struggle with some subtle semantic and syntactic phenomena, such as negative polarity items and extraction islands. 
    more » « less
  3. Synthetic translations have been used for a wide range of NLP tasks primarily as a means of data augmentation. This work explores, instead, how synthetic translations can be used to revise potentially imperfect reference translations in mined bitext. We find that synthetic samples can improve bitext quality without any additional bilingual supervision when they replace the originals based on a semantic equivalence classifier that helps mitigate NMT noise. The improved quality of the revised bitext is confirmed intrinsically via human evaluation and extrinsically through bilingual induction and MT tasks. 
    more » « less
  4. Cascadilla Press (Ed.)
    The morphosyntactic information in grammatical number marking may be a useful cue for children in the process of acquiring number words. A language with dual marking, like Slovenian, may help children to bootstrap the meaning of the word “two” by drawing their attention to sets of two as a referent of language. If the dual marker indeed facilitates number learning, then we hypothesized that “two” should be acquired earlier in populations exposed to the dual marker; the dual should be learned before “two”; and knowledge of the dual form should be correlated with knowledge of “two”. We tested these hypotheses by having Slovenian and English-speaking children complete the Give-a-Number and Give-Morphology tasks. We analyzed the Give-Morphology in a new way, using stricter criteria to determine that children “know” the morphological markers than simple percent correct. In this sample, Slovenian children exposed to the dual marker did not show evidence of knowing “two” (i.e., being 2-knowers) at very young ages or earlier than English-speaking children. Knowledge of the dual marker did not precede nor correlate with the acquisition of “two”; indeed, the dual form was only acquired after the singular and plural. These analyses were conducted using an open data set with more Slovenian 2-knowers, yielding similar results. These findings present challenges for the view that grammatical number plays a role in number acquisition. This theory requires articulation about how a dual-marked language can facilitate number acquisition if children do not notice or learn the dual form. The information in grammatical number marking may be a useful cue for children in the process of acquiring number words. A language with dual marking, like Slovenian, may help children to bootstrap the meaning of the word “two” by drawing their attention to sets of two as a referent of language. If the dual marker indeed facilitates number learning, we hypothesized that “two” should be acquired earlier in populations exposed to the dual marker; the dual should be learned before “two”; and knowledge of the dual form should be correlated with knowledge of “two”. We tested these hypotheses by having Slovenian and English-speaking children complete the Give-a-Number and Give-Morphology tasks. We analyzed the Give-Morphology in a new way, using stricter criteria to determine that children “know” the morphological markers than simple percent correct. In this sample, Slovenian children exposed to the dual marker did not show evidence of knowing “two” (i.e., being 2-knowers) at very young ages or earlier than English-speaking children. Knowledge of the dual marker did not precede nor correlate with the acquisition of “two”. Indeed, the dual form was acquired only after the singular and plural. Parallel analyses were also conducted using an open data set with more Slovenian 2-knowers, yielding similar results. These findings present challenges for the claim that grammatical number plays a role in number acquisition. Specifically, this theory requires better articulation about how a dual-marked language can facilitate number acquisition if children do not notice or learn the dual form. 
    more » « less
  5. Louis, Matthieu (Ed.)
    Imaging neural activity in a behaving animal presents unique challenges in part because motion from an animal’s movement creates artifacts in fluorescence intensity time-series that are difficult to distinguish from neural signals of interest. One approach to mitigating these artifacts is to image two channels simultaneously: one that captures an activity-dependent fluorophore, such as GCaMP, and another that captures an activity-independent fluorophore such as RFP. Because the activity-independent channel contains the same motion artifacts as the activity-dependent channel, but no neural signals, the two together can be used to identify and remove the artifacts. However, existing approaches for this correction, such as taking the ratio of the two channels, do not account for channel-independent noise in the measured fluorescence. Here, we present Two-channel Motion Artifact Correction (TMAC), a method which seeks to remove artifacts by specifying a generative model of the two channel fluorescence that incorporates motion artifact, neural activity, and noise. We use Bayesian inference to infer latent neural activity under this model, thus reducing the motion artifact present in the measured fluorescence traces. We further present a novel method for evaluating ground-truth performance of motion correction algorithms by comparing the decodability of behavior from two types of neural recordings; a recording that had both an activity-dependent fluorophore and an activity-independent fluorophore (GCaMP and RFP) and a recording where both fluorophores were activity-independent (GFP and RFP). A successful motion correction method should decode behavior from the first type of recording, but not the second. We use this metric to systematically compare five models for removing motion artifacts from fluorescent time traces. We decode locomotion from a GCaMP expressing animal 20x more accurately on average than from control when using TMAC inferred activity and outperforms all other methods of motion correction tested, the best of which were ~8x more accurate than control. 
    more » « less