skip to main content


Title: Learning to Extract and Use ASNs in Hostnames
We present the design, implementation, evaluation, and validation of a system that learns regular expressions (regexes) to extract Autonomous System Numbers (ASNs) from hostnames associated with router interfaces. We train our system with ASNs inferred by RouterToAsAssignment and bdrmapIT using topological constraints from traceroute paths, as well as ASNs recorded by operators in PeeringDB, to learn regexes for 206 different suffixes. Because these methods for inferring router ownership can infer the wrong ASN, we modify bdrmapIT to integrate this new capability to extract ASNs from hostnames. Evaluating against ground truth, our modification correctly distinguished stale from correct hostnames for 92.5% of hostnames with an ASN different from bdrmapIT’s initial inference. This modification allowed bdrmapIT to increase the agreement between extracted and inferred ASNs for these routers in the January 2020 ITDK from 87.4% to 97.1% and reduce the error rate from 1/7.9 to 1/34.5. This work presents a new avenue for collecting validation data, opening a broader horizon of opportunity for evidence-based router ownership inference.  more » « less
Award ID(s):
1724853 1901517
NSF-PAR ID:
10289016
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
IMC '20: Proceedings of the ACM Internet Measurement Conference
Page Range / eLocation ID:
386 to 392
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    System-on-Chips (SoCs) are designed using different Intellectual Property (IP) blocks from multiple third-party vendors to reduce design cost while meeting aggressive time-to-market constraints. Designing trustworthy SoCs need to address the increasing concerns related to supply-chain security vulnerabilities. Malicious implants on IPs, such as Hardware Trojans (HTs) are one of the significant security threats in designing trustworthy SoCs. It is a major challenge to detect Trojans in complex multi-processor SoCs using conventional pre- and post-silicon validation methodologies. Packet-based Network-on-Chip (NoC) is a widely used solution for on-chip communication between IPs in complex SoCs. The focus of this paper is to enable trusted NoC communication in the presence of potentially untrusted IPs. This paper makes three key contributions. (1) We model an HT in NoC router that activates misrouting of the packets to initiate denial of service, delay of service, and injection suppression. (2) We propose a dynamic shielding technique that isolates the identified HT infected IP. (3) We present a secure routing algorithm to bypass the HT infected NoC router. Experimental results on HT infected NoC demonstrate that the proposed method reduces effective average packet latency by 38% in real benchmarks and 48% in synthetic traffic patterns. Our method also increases throughput and reduces effective average deflected packet latency by 62% in real benchmarks and 97% in synthetic traffic patterns. 
    more » « less
  2. Abstract

    Social inequality is a consistent feature of animal societies, often manifesting as dominance hierarchies, in which each individual is characterized by a dominance rank denoting its place in the network of competitive relationships among group members. Most studies treat dominance hierarchies as static entities despite their true longitudinal, and sometimes highly dynamic, nature.

    To guide study of the dynamics of dominance, we propose the concept of a longitudinal hierarchy: the characterization of a single, latent hierarchy and its dynamics over time. Longitudinal hierarchies describe the hierarchy position (r) and dynamics () associated with each individual as a property of its interaction data, the periods into which these data are divided based on a period delineation rule (p) and the method chosen to infer the hierarchy. Hierarchy dynamics result from both active (∆a) and passive (∆p) processes. Methods that infer longitudinal hierarchies should optimize accuracy of rank dynamics as well as of the rank orders themselves, but no studies have yet evaluated the accuracy with which different methods infer hierarchy dynamics.

    We modify three popular ranking approaches to make them better suited for inferring longitudinal hierarchies. Our three “informed” methods assign ranks that are informed by data from the prior period rather than calculating ranksde novoin each observation period and use prior knowledge of dominance correlates to inform placement of new individuals in the hierarchy. These methods are provided in an R package.

    Using both a simulated dataset and a long‐term empirical dataset from a species with two distinct sex‐based dominance structures, we compare the performance of these methods and their unmodified counterparts. We show that choice of method has dramatic impacts on inference of hierarchy dynamics via differences in estimates of∆a. Methods that calculate ranksde novoin each period overestimate hierarchy dynamics, but incorporation of prior information leads to more accurately inferred∆a. Of the modified methods, Informed MatReorder infers the most conservative estimates of hierarchy dynamics and Informed Elo infers the most dynamic hierarchies.

    This work provides crucially needed conceptual framing and methodological validation for studying social dominance and its dynamics.

     
    more » « less
  3. Eddy covariance data are invaluable for determining ecosystem water use strategies under soil water stress. However, existing stress inference methods require numerous subjective data processing and model specification assumptions whose effect on the inferred soil water stress signal is rarely quantified. These uncertainties may confound the stress inference and the generalization of ecosystem water use strategies across multiple sites and studies. In this research, we quantify the sensitivity of soil water stress signals inferred from eddy covariance data to the prevailing data and modeling assumptions (i.e., their robustness) to compile a comprehensive list of sites with robust soil water stress signals and assess the performance of current stress inference methods. To accomplish this, we identify the most prevalent assumptions from the literature and perform a digital factorial experiment to extract probability distributions of plausible soil water stress signals and model performance at 151 FLUXNET2015 and AmeriFlux-FLUXNET sites. We develop a new framework that summarizes these probability distributions to classify and rank the robustness of each site’s soil water stress signal, which we display with a user-friendly heat map. We estimate that only 5%–36% of sites exhibit a robust soil water stress signal due to deficient model performance and poorly constrained ecosystem water use parameters. We also find that the lack of robustness is site-specific, which undermines grouping stress signals by broad ecosystem categories or comparing results across studies with differing assumptions. Lastly, existing stress inference methods appear better suited for eddy covariance sites with grass/annual vegetation. Our findings call for more careful and consistent inference of ecosystem water stress from eddy covariance data. 
    more » « less
  4. Vision Transformers (ViTs) have shown impressive performance and have become a unified backbone for multiple vision tasks. However, both the attention mechanism and multi-layer perceptrons (MLPs) in ViTs are not sufficiently efficient due to dense multiplications, leading to costly training and inference. To this end, we propose to reparameterize pre-trained ViTs with a mixture of multiplication primitives, e.g., bitwise shifts and additions, towards a new type of multiplication-reduced model, dubbed ShiftAddViT, which aims to achieve end-to-end inference speedups on GPUs without requiring training from scratch. Specifically, all MatMuls among queries, keys, and values are reparameterized using additive kernels, after mapping queries and keys to binary codes in Hamming space. The remaining MLPs or linear layers are then reparameterized with shift kernels. We utilize TVM to implement and optimize those customized kernels for practical hardware deployment on GPUs. We find that such a reparameterization on (quadratic or linear) attention maintains model accuracy, while inevitably leading to accuracy drops when being applied to MLPs. To marry the best of both worlds, we further propose a new mixture of experts (MoE) framework to reparameterize MLPs by taking multiplication or its primitives as experts, e.g., multiplication and shift, and designing a new latency-aware load-balancing loss. Such a loss helps to train a generic router for assigning a dynamic amount of input tokens to different experts according to their latency. In principle, the faster the experts run, the more input tokens they are assigned. Extensive experiments on various 2D/3D Transformer-based vision tasks consistently validate the effectiveness of our proposed ShiftAddViT, achieving up to 5.18x latency reductions on GPUs and 42.9% energy savings, while maintaining a comparable accuracy as original or efficient ViTs. Codes and models are available at https://github.com/GATECH-EIC/ShiftAddViT. 
    more » « less
  5. Regular expressions are used for diverse purposes, including input validation and firewalls. Unfortunately, they can also lead to a security vulnerability called ReDoS (Regular Expression Denial of Service), caused by a super-linear worst-case execution time during regex matching. Due to the severity and prevalence of ReDoS, past work proposed automatic tools to detect and fix regexes. Although these tools were evaluated in automatic experiments, their usability has not yet been studied; usability has not been a focus of prior work. Our insight is that the usability of existing tools to detect and fix regexes will improve if we complement them with anti-patterns and fix strategies of vulnerable regexes. We developed novel anti-patterns for vulnerable regexes, and a collection of fix strategies to fix them. We derived our anti-patterns and fix strategies from a novel theory of regex infinite ambiguity — a necessary condition for regexes vulnerable to ReDoS. We proved the soundness and completeness of our theory. We evaluated the effectiveness of our anti-patterns, both in an automatic experiment and when applied manually. Then, we evaluated how much our anti-patterns and fix strategies improve developers’ understanding of the outcome of detection and fixing tools. Our evaluation found that our anti-patterns were effective over a large dataset of regexes (N=209,188): 100% precision and 99% recall, improving the state of the art 50% precision and 87% recall. Our anti-patterns were also more effective than the state of the art when applied manually (N=20): 100% developers applied them effectively vs. 50% for the state of the art. Finally, our anti-patterns and fix strategies increased developers’ understanding using automatic tools (N=9): from median “Very weakly” to median “Strongly” when detecting vulnerabilities, and from median “Very weakly” to median “Very strongly” when fixing them. 
    more » « less