Network configuration remains time-consuming and error-prone with the current configuration command system. To create access control lists (ACLs) with commands containing many options is still considered as a difficult task. In light of this, we aim to develop a comprehensible way to the ACL con- struction. Based on Eliza, a prototype of Artificial Intelligence, we propose a new design called EASYACL that synthesizes ACL rules automatically from natural language descriptions. E A S YAC L demonstrates the effectiveness of domain-specific program synthesis. Through the use of natural language, ACL rules can be constructed without using an excessive number of options or rigid syntax. By introducing the batch processing, we make it possible for users to apply configurations to a range of IP addresses rather than tediously repeating commands. EASYACL supports multi-platform by an intermediate repre- sentation which may be ported to the commands for both Cisco and Juniper devices. The comprehensible commands are friendly for encapsulation as well as reuse. E A S YAC L enables end-users with no prior programming experience to construct ACL in a natural way which lowers the bar for security management training and also reduces the errors in network administration.
Errudite: Scalable, Reproducible, and Testable Error Analysis
Though error analysis is crucial to understanding and improving NLP models, the common practice of manual, subjective categorization of a small sample of errors can yield biased and incomplete conclusions. This paper codifies model and task agnostic principles for informative error analysis, and presents Errudite, an interactive tool for better supporting this process. First, error groups should be precisely defined for reproducibility; Errudite supports this with an expressive domain-specific language. Second, to avoid spurious conclusions, a large set of instances should be analyzed, including both positive and negative examples; Errudite enables systematic grouping of relevant instances with filtering queries. Third, hypotheses about the cause of errors should be explicitly tested; Errudite supports this via automated counterfactual rewriting. We validate our approach with a user study, finding that Errudite (1) enables users to perform high quality and reproducible error analyses with less effort, (2) reveals substantial ambiguities in prior published error analyses practices, and (3) enhances the error analysis experience by allowing users to test and revise prior beliefs.
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
- Page Range or eLocation-ID:
- 747 to 763
- Sponsoring Org:
- National Science Foundation
More Like this
Phase Resolution of Heterozygous Sites in Diploid Genomes is Important to Phylogenomic Analysis under the Multispecies Coalescent ModelThomson, Robert (Ed.)Abstract Genome sequencing projects routinely generate haploid consensus sequences from diploid genomes, which are effectively chimeric sequences with the phase at heterozygous sites resolved at random. The impact of phasing errors on phylogenomic analyses under the multispecies coalescent (MSC) model is largely unknown. Here, we conduct a computer simulation to evaluate the performance of four phase-resolution strategies (the true phase resolution, the diploid analytical integration algorithm which averages over all phase resolutions, computational phase resolution using the program PHASE, and random resolution) on estimation of the species tree and evolutionary parameters in analysis of multilocus genomic data under the MSC model. We found that species tree estimation is robust to phasing errors when species divergences were much older than average coalescent times but may be affected by phasing errors when the species tree is shallow. Estimation of parameters under the MSC model with and without introgression is affected by phasing errors. In particular, random phase resolution causes serious overestimation of population sizes for modern species and biased estimation of cross-species introgression probability. In general, the impact of phasing errors is greater when the mutation rate is higher, the data include more samples per species, and the species tree is shallowermore »
Long analysis times are a key bottleneck for the widespread adoption of whole-program static analysis tools. Fortunately, however, a user is often only interested in finding errors in the application code, which constitutes a small fraction of the whole program. Current application-focused analysis tools overapproximate the effect of the library and hence reduce the precision of the analysis results. However, empirical studies have shown that users have high expectations on precision and will ignore tool results that don't meet these expectations. In this paper, we introduce the first tool QueryMax that significantly speeds up an application code analysis without dropping any precision. QueryMax acts as a pre-processor to an existing analysis tool to select a partial library that is most relevant to the analysis queries in the application code. The selected partial library plus the application is given as input to the existing static analysis tool, with the remaining library pointers treated as the bottom element in the abstract domain. This achieves a significant speedup over a whole-program analysis, at the cost of a few lost errors, and with no loss in precision. We instantiate and run experiments on QueryMax for a cast-check analysis and a null-pointer analysis. For amore »
Sports broadcasters inject drama into play-by-play commentary by building team and player narratives through subjective analyses and anecdotes. Prior studies based on small datasets and manual coding show that such theatrics evince commentator bias in sports broadcasts. To examine this phenomenon, we assemble FOOTBALL, which contains 1,455 broadcast transcripts from American football games across six decades that are automatically annotated with 250K player mentions and linked with racial metadata. We identify major confounding factors for researchers examining racial bias in FOOTBALL, and perform a computational analysis that supports conclusions from prior social science studies.
Abstract Target enrichment (such as Hyb-Seq) is a well-established high throughput sequencing method that has been increasingly used for phylogenomic studies. Unfortunately, current widely used pipelines for analysis of target enrichment data do not have a vigorous procedure to remove paralogs in target enrichment data. In this study, we develop a pipeline we call Putative Paralogs Detection (PPD) to better address putative paralogs from enrichment data. The new pipeline is an add-on to the existing HybPiper pipeline, and the entire pipeline applies criteria in both sequence similarity and heterozygous sites at each locus in the identification of paralogs. Users may adjust the thresholds of sequence identity and heterozygous sites to identify and remove paralogs according to the level of phylogenetic divergence of their group of interest. The new pipeline also removes highly polymorphic sites attributed to errors in sequence assembly and gappy regions in the alignment. We demonstrated the value of the new pipeline using empirical data generated from Hyb-Seq and the Angiosperm 353 kit for two woody genera Castanea (Fagaceae, Fagales) and Hamamelis (Hamamelidaceae, Saxifragales). Comparisons of datasets showed that the PPD identified many more putative paralogs than the popular method HybPiper. Comparisons of tree topologies and divergence timesmore »