skip to main content


Title: AnyOpt: Predicting and Optimizing IP Anycast Performance
The key to optimizing the performance of an anycast-based sys- tem (e.g., the root DNS or a CDN) is choosing the right set of sites to announce the anycast prefix. One challenge here is predicting catchments. A naïve approach is to advertise the prefix from all subsets of available sites and choose the best-performing subset, but this does not scale well. We demonstrate that by conducting pairwise experiments between sites peering with tier-1 networks, we can predict the catchments that would result if we announce to any subset of the sites. We prove that our method is effective in a simplified model of BGP, consistent with common BGP routing policies, and evaluate it in a real-world testbed. We then present AnyOpt, a system that predicts anycast catchments. Using AnyOpt, a network operator can find a subset of anycast sites that minimizes client latency without using the naïve approach. In an experiment using 15 sites, each peering with one of six transit providers, AnyOpt predicted site catchments of 15,300 clients with 94.7% accuracy and client RTTs with a mean error of 4.6%. AnyOpt identified a subset of 12 sites, announcing to which lowers the mean RTT to clients by 33ms compared to a greedy approach that enables the same number of sites with the lowest average unicast latency.  more » « less
Award ID(s):
1763617 1763742
NSF-PAR ID:
10281106
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Proceedings of the ACM SIGCOMM Conference, 2021
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. DNS latency is a concern for many service operators: CDNs exist to reduce service latency to end-users but must rely on global DNS for reachability and load-balancing. Today, DNS latency is monitored by active probing from distributed platforms like RIPE Atlas, with Verfploeter, or with commercial services. While Atlas coverage is wide, its 10k sites see only a fraction of the Internet. In this paper we show that passive observation of TCP handshakes can measure \emph{live DNS latency, continuously, providing good coverage of current clients of the service}. Estimating RTT from TCP is an old idea, but its application to DNS has not previously been studied carefully. We show that there is sufficient TCP DNS traffic today to provide good operational coverage (particularly of IPv6), and very good temporal coverage (better than existing approaches), enabling near-real time evaluation of DNS latency from \emph{real clients}. We also show that DNS servers can optionally solicit TCP to broaden coverage. We quantify coverage and show that estimates of DNS latency from TCP is consistent with UDP latency. Our approach finds previously unknown, real problems: \emph{DNS polarization} is a new problem where a hypergiant sends global traffic to one anycast site rather than taking advantage of the global anycast deployment. Correcting polarization in Google DNS cut its latency from 100ms to 10ms; and from Microsoft Azure cut latency from 90ms to 20ms. We also show other instances of routing problems that add 100--200ms latency. Finally, \emph{real-time} use of our approach for a European country-level domain has helped detect and correct a BGP routing misconfiguration that detoured European traffic to Australia. We have integrated our approach into several open source tools: Entrada, our open source data warehouse for DNS, a monitoring tool (ANTS), which has been operational for the last 2 years on a country-level top-level domain, and a DNS anonymization tool in use at a root server since March 2021. 
    more » « less
  2. The Border Gateway Protocol (BGP) is the protocol that networks use to exchange (announce) routing information across the Internet. Unfortunately, BGP has no mechanism to prevent unauthorized announcement of network addresses, also known as prefix hijacks. Since the 1990s, the primary means of protecting against unauthorized origin announcements has been the use of routing information databases, so that networks can verify prefix origin information they receive from their neighbors in BGP messages. In the 1990s, operators deployed databases now collectively known as the Internet Routing Registry (IRR), which depend on voluntary (although sometimes contractually required) contribution of routing information without strict (or sometimes any) validation. Coverage, accuracy, and use of these databases remains inconsistent across ISPs and over time. In 2012, after years of debate over approaches to improving routing security, the operator community deployed an alternative known as the Resource Public Key Infrastructure (RPKI). The RPKI includes cryptographic attestation of records, including expiration dates, with each Regional Internet Registry (RIR) operating as a "root" of trust. Similar to the IRR, operators can use the RPKI to discard routing messages that do not pass origin validation checks. But the additional integrity comes with complexity and cost. Furthermore, operational and legal implications of potential malfunctions have limited registration in and use of the RPKI. In response, some networks have redoubled their efforts to improve the accuracy of IRR registration data. These two technologies are now operating in parallel, along with the option of doing nothing at all to validate routes. Although RPKI use is growing, its limited coverage means that security-conscious operators may query both IRR and RPKI databases to maximize routing security. However, IRR information may be inaccurate due to improper hygiene, such as not updating the origin information after changes in routing policy or prefix ownership. Since RPKI uses a stricter registration and validation process, we use it as a baseline against which to compare the trends in accuracy and coverage of IRR data. 
    more » « less
  3. IP anycast is used for services such as DNS and Content Delivery Networks (CDN) to provide the capacity to handle Distributed Denial-of-Service (DDoS) attacks. During a DDoS attack service operators redistribute traffic between anycast sites to take advantage of sites with unused or greater capacity. Depending on site traffic and attack size, operators may instead concentrate attackers in a few sites to preserve operation in others. Operators use these actions during attacks, but how to do so has not been described systematically or publicly. This paper describes several methods to use BGP to shift traffic when under DDoS, and shows that a \emph{response playbook} can provide a menu of responses that are options during an attack. To choose an appropriate response from this playbook, we also describe a new method to estimate true attack size, even though the operator's view during the attack is incomplete. Finally, operator choices are constrained by distributed routing policies, and not all are helpful. We explore how specific anycast deployment can constrain options in this playbook, and are the first to measure how generally applicable they are across multiple anycast networks. 
    more » « less
  4. IPv6's large address space allows ample freedom for choosing and assigning addresses. To improve client privacy and resist IP-based tracking, standardized techniques leverage this large address space, including privacy extensions and provider prefix rotation. Ephemeral and dynamic IPv6 addresses confound not only tracking and traffic correlation attempts, but also traditional network measurements, logging, and defense mechanisms. We show that the intended anti-tracking capability of these widely deployed mechanisms is unwittingly subverted by edge routers using legacy IPv6 addressing schemes that embed unique identifiers. We develop measurement techniques that exploit these legacy devices to make tracking such moving IPv6 clients feasible by combining intelligent search space reduction with modern high-speed active probing. Via an Internet-wide measurement campaign, we discover more than 9M affected edge routers and approximately 13k /48 prefixes employing prefix rotation in hundreds of ASes worldwide. We mount a six-week campaign to characterize the size and dynamics of these deployed IPv6 rotation pools, and demonstrate via a case study the ability to remotely track client address movements over time. We responsibly disclosed our findings to equipment manufacturers, at least one of which subsequently changed their default addressing logic. 
    more » « less
  5. We use traceroute and BGP data from globally distributed Internet measurement infrastructures to study the impact of a noteworthy submarine cable launch connecting Africa to South America. We leverage archived data from RIPE Atlas and CAIDA Ark platforms, as well as custom measurements from strategic vantage points, to quantify the differences in end-to-end latency and path lengths before and after deployment of this new South-Atlantic cable. We find that ASes operating in South America significantly benefit from this new cable, with reduced latency to all measured African countries. More surprising is that end-to-end latency to/from some regions of the world, including intra-African paths towards Angola, increased after switching to the cable. We track these unintended consequences to suboptimally circuitous IP paths that traveled from Africa to Europe, possibly North America, and South America before traveling back to Africa over the cable. Although some suboptimalities are expected given the lack of peering among neighboring ASes in the developing world, we found two other causes: (i) problematic intra-domain routing within a single Angolese network, and (ii) suboptimal routing/traffic engineering by its BGP neighbors. After notifying the operating AS of our results, we found that most of these suboptimalities were subsequently resolved. We designed our method to generalize to the study of other cable deployments or outages and share our code to promote reproducibility and extension of our work. 
    more » « less