skip to main content


Title: Errudite: Scalable, Reproducible, and Testable Error Analysis
Though error analysis is crucial to understanding and improving NLP models, the common practice of manual, subjective categorization of a small sample of errors can yield biased and incomplete conclusions. This paper codifies model and task agnostic principles for informative error analysis, and presents Errudite, an interactive tool for better supporting this process. First, error groups should be precisely defined for reproducibility; Errudite supports this with an expressive domain-specific language. Second, to avoid spurious conclusions, a large set of instances should be analyzed, including both positive and negative examples; Errudite enables systematic grouping of relevant instances with filtering queries. Third, hypotheses about the cause of errors should be explicitly tested; Errudite supports this via automated counterfactual rewriting. We validate our approach with a user study, finding that Errudite (1) enables users to perform high quality and reproducible error analyses with less effort, (2) reveals substantial ambiguities in prior published error analyses practices, and (3) enhances the error analysis experience by allowing users to test and revise prior beliefs.  more » « less
Award ID(s):
1901386
NSF-PAR ID:
10172006
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
Page Range / eLocation ID:
747 to 763
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Network configuration remains time-consuming and error-prone with the current configuration command system. To create access control lists (ACLs) with commands containing many options is still considered as a difficult task. In light of this, we aim to develop a comprehensible way to the ACL con- struction. Based on Eliza, a prototype of Artificial Intelligence, we propose a new design called EASYACL that synthesizes ACL rules automatically from natural language descriptions. E A S YAC L demonstrates the effectiveness of domain-specific program synthesis. Through the use of natural language, ACL rules can be constructed without using an excessive number of options or rigid syntax. By introducing the batch processing, we make it possible for users to apply configurations to a range of IP addresses rather than tediously repeating commands. EASYACL supports multi-platform by an intermediate repre- sentation which may be ported to the commands for both Cisco and Juniper devices. The comprehensible commands are friendly for encapsulation as well as reuse. E A S YAC L enables end-users with no prior programming experience to construct ACL in a natural way which lowers the bar for security management training and also reduces the errors in network administration. 
    more » « less
  2. Abstract In this note, we apply transition path theory (TPT) from Markov chains to shed light on the problem of Iceland–Scotland Overflow Water (ISOW) equatorward export. A recent analysis of observed trajectories of submerged floats demanded revision of the traditional abyssal circulation theory, which postulates that ISOW should steadily flow along a deep boundary current (DBC) around the subpolar North Atlantic prior to exiting it. The TPT analyses carried out here allow attention to be focused on the portions of flow from the origin of ISOW to the region where ISOW exits the subpolar North Atlantic and suggest that insufficient sampling may be biasing the aforementioned demand. The analyses, appropriately adapted to represent a continuous input of ISOW, are carried out on three time-homogeneous Markov chains modeling the ISOW flow. One is constructed using a high number of simulated trajectories homogeneously covering the flow domain. The other two use much fewer trajectories which heterogeneously cover the domain. The trajectories in the latter two chains are observed trajectories or simulated trajectories subsampled at the observed frequency. While the densely sampled chain supports a well-defined DBC, whether this is a peculiarity of the simulation considered or not, the more heterogeneously sampled chains do not, irrespective of the nature of the trajectories used, i.e., observed or simulated. Studying the sampling sensitivity of the Markov chains, we can give recommendations for enlarging the existing float dataset to improve the significance of conclusions about long-time-asymptotic aspects of the ISOW circulation. 
    more » « less
  3. Abstract

    Landmark‐based geometric morphometrics has emerged as an essential discipline for the quantitative analysis of size and shape in ecology and evolution. With the ever‐increasing density of digitized landmarks, the possible development of a fully automated method of landmark placement has attracted considerable attention. Despite the recent progress in image registration techniques, which could provide a pathway to automation, three‐dimensional (3D) morphometric data are still mainly gathered by trained experts. For the most part, the large infrastructure requirements necessary to perform image‐based registration, together with its system specificity and its overall speed, have prevented its wide dissemination.

    Here, we propose and implement a general and lightweight point cloud‐based approach to automatically collect high‐dimensional landmark data in 3D surfaces (Automated Landmarking through Point cloud Alignment and Correspondence Analysis). Our framework possesses several advantages compared with image‐based approaches. First, it presents comparable landmarking accuracy, despite relying on a single, random reference specimen and much sparser sampling of the structure's surface. Second, it can be efficiently run on consumer‐grade personal computers. Finally, it is general and can be applied at the intraspecific level to any biological structure of interest, regardless of whether anatomical atlases are available.

    Our validation procedures indicate that the method can recover intraspecific patterns of morphological variation that are largely comparable to those obtained by manual digitization, indicating that the use of an automated landmarking approach should not result in different conclusions regarding the nature of multivariate patterns of morphological variation.

    The proposed point cloud‐based approach has the potential to increase the scale and reproducibility of morphometrics research. To allow ALPACA to be used out‐of‐the‐box by users with no prior programming experience, we implemented it as a SlicerMorph module. SlicerMorph is an extension that enables geometric morphometrics data collection and 3D specimen analysis within the open‐source 3D Slicer biomedical visualization ecosystem. We expect that convenient access to this platform will make ALPACA broadly applicable within ecology and evolution.

     
    more » « less
  4. Abstract

    Time‐scaled phylogenies underpin the interrogation of evolutionary processes across deep timescales, as well as attempts to link these to Earth's history. By inferring the placement of fossils and using their ages as temporal constraints, tip dating under the fossilized birth–death (FBD) process provides a coherent prior on divergence times. At the same time, it also links topological and temporal accuracy, as incorrectly placed fossil terminals should misinform divergence times. This could pose serious issues for obtaining accurate node ages, yet the interaction between topological and temporal error has not been thoroughly explored. We simulate phylogenies and associated morphological datasets using methodologies that incorporate evolution under selection, and are benchmarked against empirical datasets. We find that datasets of 300 characters and realistic levels of missing data generally succeed in inferring the correct placement of fossils on a constrained extant backbone topology, and that true node ages are usually contained within Bayesian posterior distributions. While increased fossil sampling improves the accuracy of inferred ages, topological and temporal errors do not seem to be linked: analyses in which fossils resolve less accurately do not exhibit elevated errors in node age estimates. At the same time, inferred divergence times are biased, probably due to a mismatch between the FBD prior and the shape of our simulated trees. While these results are encouraging, suggesting that even fossils with uncertain affinities can provide useful temporal information, they also emphasize that palaeontological information cannot overturn discrepancies between model priors and the true diversification history.

     
    more » « less
  5. Long analysis times are a key bottleneck for the widespread adoption of whole-program static analysis tools. Fortunately, however, a user is often only interested in finding errors in the application code, which constitutes a small fraction of the whole program. Current application-focused analysis tools overapproximate the effect of the library and hence reduce the precision of the analysis results. However, empirical studies have shown that users have high expectations on precision and will ignore tool results that don't meet these expectations. In this paper, we introduce the first tool QueryMax that significantly speeds up an application code analysis without dropping any precision. QueryMax acts as a pre-processor to an existing analysis tool to select a partial library that is most relevant to the analysis queries in the application code. The selected partial library plus the application is given as input to the existing static analysis tool, with the remaining library pointers treated as the bottom element in the abstract domain. This achieves a significant speedup over a whole-program analysis, at the cost of a few lost errors, and with no loss in precision. We instantiate and run experiments on QueryMax for a cast-check analysis and a null-pointer analysis. For a particular configuration, QueryMax enables these two analyses to achieve, relative to a whole-program analysis, an average recall of 87%, a precision of 100% and a geometric mean speedup of 10x. 
    more » « less