Accurate mapping of Autonomous Systems (ASes) to their owner organizations is fundamental for understanding the structure and dynamics of the Internet. However, as AS numbers have traditionally been delegated in an ad-hoc manner and organizational ownership has evolved over time, many organizations have registered resources under different names. Traditionally, researchers have relied on datasets like AS2Org, which map ASNs to organizations primarily using WHOIS records, but WHOIS inconsistencies often lead to missed and false relationships. We propose a new approach by leveraging the Resource Public Key Infrastructure (RPKI) to map ASNs to their managing organization. Our methodology combines multiple data sources: WHOIS records to extract organization names, RPKI certificates to identify potential siblings, and Large Language Models (LLMs) to find evidence not visible in WHOIS records currently. This integrated approach enables a more robust and accurate mapping of ASNs to organizations, notably improving inferences for 14% of multi-ASN clusters.
more »
« less
Learning to Extract and Use ASNs in Hostnames
We present the design, implementation, evaluation, and validation of a system that learns regular expressions (regexes) to extract Autonomous System Numbers (ASNs) from hostnames associated with router interfaces. We train our system with ASNs inferred by RouterToAsAssignment and bdrmapIT using topological constraints from traceroute paths, as well as ASNs recorded by operators in PeeringDB, to learn regexes for 206 different suffixes. Because these methods for inferring router ownership can infer the wrong ASN, we modify bdrmapIT to integrate this new capability to extract ASNs from hostnames. Evaluating against ground truth, our modification correctly distinguished stale from correct hostnames for 92.5% of hostnames with an ASN different from bdrmapIT’s initial inference. This modification allowed bdrmapIT to increase the agreement between extracted and inferred ASNs for these routers in the January 2020 ITDK from 87.4% to 97.1% and reduce the error rate from 1/7.9 to 1/34.5. This work presents a new avenue for collecting validation data, opening a broader horizon of opportunity for evidence-based router ownership inference.
more »
« less
- PAR ID:
- 10289016
- Date Published:
- Journal Name:
- IMC '20: Proceedings of the ACM Internet Measurement Conference
- Page Range / eLocation ID:
- 386 to 392
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract <bold>Background</bold>Microorganisms are found in almost every environment, including soil, water, air and inside other organisms, such as animals and plants. While some microorganisms cause diseases, most of them help in biological processes such as decomposition, fermentation and nutrient cycling. Much research has been conducted on the study of microbial communities in various environments and how their interactions and relationships can provide insight into various diseases. Co-occurrence network inference algorithms help us understand the complex associations of micro-organisms, especially bacteria. Existing network inference algorithms employ techniques such as correlation, regularized linear regression, and conditional dependence, which have different hyper-parameters that determine the sparsity of the network. These complex microbial communities form intricate ecological networks that are fundamental to ecosystem functioning and host health. Understanding these networks is crucial for developing targeted interventions in both environmental and clinical settings. The emergence of high-throughput sequencing technologies has generated unprecedented amounts of microbiome data, necessitating robust computational methods for network inference and validation. <bold>Results</bold>Previous methods for evaluating the quality of the inferred network include using external data, and network consistency across sub-samples, both of which have several drawbacks that limit their applicability in real microbiome composition data sets. We propose a novel cross-validation method to evaluate co-occurrence network inference algorithms, and new methods for applying existing algorithms to predict on test data. Our method demonstrates superior performance in handling compositional data and addressing the challenges of high dimensionality and sparsity inherent in real microbiome datasets. The proposed framework also provides robust estimates of network stability. <bold>Conclusions</bold>Our empirical study shows that the proposed cross-validation method is useful for hyper-parameter selection (training) and comparing the quality of inferred networks between different algorithms (testing). This advancement represents a significant step forward in microbiome network analysis, providing researchers with a reliable tool for understanding complex microbial interactions. The method’s applicability extends beyond microbiome studies to other fields where network inference from high-dimensional compositional data is crucial, such as gene regulatory networks and ecological food webs. Our framework establishes a new standard for validation in network inference, potentially accelerating discoveries in microbial ecology and human health.more » « less
-
Regular expressions are used for diverse purposes, including input validation and firewalls. Unfortunately, they can also lead to a security vulnerability called ReDoS (Regular Expression Denial of Service), caused by a super-linear worst-case execution time during regex matching. Due to the severity and prevalence of ReDoS, past work proposed automatic tools to detect and fix regexes. Although these tools were evaluated in automatic experiments, their usability has not yet been studied; usability has not been a focus of prior work. Our insight is that the usability of existing tools to detect and fix regexes will improve if we complement them with anti-patterns and fix strategies of vulnerable regexes. We developed novel anti-patterns for vulnerable regexes, and a collection of fix strategies to fix them. We derived our anti-patterns and fix strategies from a novel theory of regex infinite ambiguity — a necessary condition for regexes vulnerable to ReDoS. We proved the soundness and completeness of our theory. We evaluated the effectiveness of our anti-patterns, both in an automatic experiment and when applied manually. Then, we evaluated how much our anti-patterns and fix strategies improve developers’ understanding of the outcome of detection and fixing tools. Our evaluation found that our anti-patterns were effective over a large dataset of regexes (N=209,188): 100% precision and 99% recall, improving the state of the art 50% precision and 87% recall. Our anti-patterns were also more effective than the state of the art when applied manually (N=20): 100% developers applied them effectively vs. 50% for the state of the art. Finally, our anti-patterns and fix strategies increased developers’ understanding using automatic tools (N=9): from median “Very weakly” to median “Strongly” when detecting vulnerabilities, and from median “Very weakly” to median “Very strongly” when fixing them.more » « less
-
A resource leak occurs when a program fails to free some finite resource after it is no longer needed. Such leaks are a significant cause of real-world crashes and performance problems. Recent work proposed an approach to prevent resource leaks based on checking resource management specifications. A resource management specification expresses how the program allocates resources, passes them around, and releases them; it also tracks the ownership relationship between objects and resources, and aliasing relationships between objects. While this specify-and-verify approach has several advantages compared to prior techniques, the need to manually write annotations presents a significant barrier to its practical adoption. This paper presents a novel technique to automatically infer a resource management specification for a program, broadening the applicability of specify-and-check verification for resource leaks. Inference in this domain is challenging because resource management specifications differ significantly in nature from the types that most inference techniques target. Further, for practical effectiveness, we desire a technique that can infer the resource management specification intended by the developer, even in cases when the code does not fully adhere to that specification. We address these challenges through a set of inference rules carefully designed to capture real-world coding patterns, yielding an effective fixed-point-based inference algorithm. We have implemented our inference algorithm in two different systems, targeting programs written in Java and C#. In an experimental evaluation, our technique inferred 85.5% of the annotations that programmers had written manually for the benchmarks. Further, the verifier issued nearly the same rate of false alarms with the manually-written and automatically-inferred annotations.more » « less
-
Recent systems for converting natural language descriptions into regular expressions (regexes) have achieved some success, but typically deal with short, formulaic text and can only produce simple regexes. Real-world regexes are complex, hard to describe with brief sentences, and sometimes require examples to fully convey the user’s intent. We present a framework for regex synthesis in this setting where both natural language (NL) and examples are available. First, a semantic parser (either grammar-based or neural) maps the natural language description into an intermediate sketch, which is an incomplete regex containing holes to denote missing components. Then a program synthesizer searches over the regex space defined by the sketch and finds a regex that is consistent with the given string examples. Our semantic parser can be trained purely from weak supervision based on correctness of the synthesized regex, or it can leverage heuristically derived sketches. We evaluate on two prior datasets (Kushman and Barzilay 2013 ; Locascio et al. 2016 ) and a real-world dataset from Stack Overflow. Our system achieves state-of-the-art performance on the prior datasets and solves 57% of the real-world dataset, which existing neural systems completely fail on. 1more » « less
An official website of the United States government

