skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Errudite: Scalable, Reproducible, and Testable Error Analysis
Though error analysis is crucial to understanding and improving NLP models, the common practice of manual, subjective categorization of a small sample of errors can yield biased and incomplete conclusions. This paper codifies model and task agnostic principles for informative error analysis, and presents Errudite, an interactive tool for better supporting this process. First, error groups should be precisely defined for reproducibility; Errudite supports this with an expressive domain-specific language. Second, to avoid spurious conclusions, a large set of instances should be analyzed, including both positive and negative examples; Errudite enables systematic grouping of relevant instances with filtering queries. Third, hypotheses about the cause of errors should be explicitly tested; Errudite supports this via automated counterfactual rewriting. We validate our approach with a user study, finding that Errudite (1) enables users to perform high quality and reproducible error analyses with less effort, (2) reveals substantial ambiguities in prior published error analyses practices, and (3) enhances the error analysis experience by allowing users to test and revise prior beliefs.  more » « less
Award ID(s):
1901386
PAR ID:
10172006
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Page Range / eLocation ID:
747 to 763
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Network configuration remains time-consuming and error-prone with the current configuration command system. To create access control lists (ACLs) with commands containing many options is still considered as a difficult task. In light of this, we aim to develop a comprehensible way to the ACL con- struction. Based on Eliza, a prototype of Artificial Intelligence, we propose a new design called EASYACL that synthesizes ACL rules automatically from natural language descriptions. E A S YAC L demonstrates the effectiveness of domain-specific program synthesis. Through the use of natural language, ACL rules can be constructed without using an excessive number of options or rigid syntax. By introducing the batch processing, we make it possible for users to apply configurations to a range of IP addresses rather than tediously repeating commands. EASYACL supports multi-platform by an intermediate repre- sentation which may be ported to the commands for both Cisco and Juniper devices. The comprehensible commands are friendly for encapsulation as well as reuse. E A S YAC L enables end-users with no prior programming experience to construct ACL in a natural way which lowers the bar for security management training and also reduces the errors in network administration. 
    more » « less
  2. Abstract Sequence alignment is an essential method in bioinformatics and the basis of many analyses, including phylogenetic inference, ancestral sequence reconstruction, and gene annotation. Sequencing artifacts and errors made during genome assembly, such as abiological frameshifts and incorrect early stop codons, can impact downstream analyses leading to erroneous conclusions in comparative and functional genomic studies. More significantly, while indels can occur both within and between codons in natural sequences, most amino-acid- and codon-based aligners assume that indels only occur between codons. This mismatch between biology and alignment algorithms produces suboptimal alignments and errors in downstream analyses. To address these issues, we present COATi, a statistical, codon-aware pairwise aligner that supports complex insertion–deletion models and can handle artifacts present in genomic data. COATi allows users to reduce the amount of discarded data while generating more accurate sequence alignments. COATi can infer indels both within and between codons, leading to improved sequence alignments. We applied COATi to a dataset containing orthologous protein-coding sequences from humans and gorillas and conclude that 41% of indels occurred between codons, agreeing with previous work in other species. We also applied COATi to semiempirical benchmark alignments and find that it outperforms several popular alignment programs on several measures of alignment quality and accuracy. 
    more » « less
  3. This paper proposes, EFTSanitizer, a fast shadow execution framework for detecting and debugging numerical errors during late stages of testing especially for long-running applications. Any shadow execution framework needs an oracle to compare against the floating point (FP) execution. This paper makes a case for using error free transformations, which is a sequence of operations to compute the error of a primitive operation with existing hardware supported FP operations, as an oracle for shadow execution. Although the error of a single correctly rounded FP operation is bounded, the accumulation of errors across operations can result in exceptions, slow convergences, and even crashes. To ease the job of debugging such errors, EFTSanitizer provides a directed acyclic graph (DAG) that highlights the propagation of errors, which results in exceptions or crashes. Unlike prior work, DAGs produced by EFTSanitizer include operations that span various function calls while keeping the memory usage bounded. To enable the use of such shadow execution tools with long-running applications, EFTSanitizer also supports starting the shadow execution at an arbitrary point in the dynamic execution, which we call selective shadow execution. EFTSanitizer is an order of magnitude faster than prior state-of-art shadow execution tools such as FPSanitizer and Herbgrind. We have discovered new numerical errors and debugged them using EFTSanitizer. 
    more » « less
  4. Abstract Machine learning interatomic potential (MLIP) has been widely adopted for atomistic simulations. While errors and discrepancies for MLIPs have been reported, a comprehensive examination of the MLIPs’ performance over a broad spectrum of material properties has been lacking. This study introduces an analysis process comprising model sampling, benchmarking, error evaluations, and multi-dimensional statistical analyses on an ensemble of MLIPs for prediction errors over a diverse range of properties. By carrying out this analysis on 2300 MLIP models based on six different MLIP types, several properties that pose challenges for the MLIPs to achieve small errors are identified. The Pareto front analyses on two or more properties reveal the trade-offs in different properties of MLIPs, underscoring the difficulties of achieving low errors for a large number of properties simultaneously. Furthermore, we propose correlation graph analyses to characterize the error performances of MLIPs and to select the representative properties for predicting other property errors. This analysis process on a large dataset of MLIP models sheds light on the underlying complexities of MLIP performance, offering crucial guidance for the future development of MLIPs with improved predictive accuracy across an array of material properties. 
    more » « less
  5. Abstract Most readers have had the experience of initially failing to notice an omission or repetition of a function word, or a transposition of two adjacent words. In the present article, we review recent research investigating this phenomenon. We emphasize that failure to notice such errors is of substantial theoretical interest, given what we have learned about how systematically and incrementally readers inspect and process text. We endorse the idea that a process of rational inference may play a critical role, while we cast doubt on the idea that failure to notice errors arises from parallel processing of multiple words. We review a number of recent studies from our own laboratory that have investigated the relationship between eye movements during reading and noticing, or failing to notice, an error. While the conclusions from these studies are broadly consistent with a rational inference account, we find that when readers fail to notice an error, their eye movements generally show no indication that the error was registered at all. On its surface, this finding may be viewed as inconsistent with the idea that the rational inference process that enables readers to overlook errors is genuinely post‐perceptual. We suggest a mechanism by which eye movement control models could account for this finding. 
    more » « less