The Domain Name System (DNS) is an essential service for the Internet which maps host names to IP addresses. The DNS Root Sever System operates the top of this namespace. RIPE Atlas observes DNS from more than 11k vantage points (VPs) around the world, reporting the reliability of the DNS Root Server System in DNSmon. DNSmon shows that loss rates for queries to the DNS Root are nearly 10\% for IPv6, much higher than the approximately 2\% loss seen for IPv4. Although IPv6 is ``new,'' as an operational protocol available to a third of Internet users, it ought to be just as reliable as IPv4. We examine this difference at a finer granularity by investigating loss at individual VPs. We confirm that specific VPs are the source of this difference and identify two root causes: VP \emph{islands} with routing problems at the edge which leave them unable to access IPv6 outside their LAN, and VP \emph{peninsulas} which indicate routing problems in the core of the network. These problems account for most of the loss and nearly all of the difference between IPv4 and IPv6 query loss rates. Islands account for most of the loss (half of IPv4 failures and 5/6ths of IPv6 failures), and we suggest these measurement devices should be filtered out to get a more accurate picture of loss rates. Peninsulas account for the main differences between root identifiers, suggesting routing disagreements root operators need to address. We believe that filtering out both of these known problems provides a better measure of underlying network anomalies and loss and will result in more actionable alerts.
more »
« less
Understanding Partial Reachability in the Internet Core
Routing strives to connect all the Internet, but compete: political pressure threatens routing fragmentation; architectural changes such as private clouds, carrier-grade NAT, and firewalls make connectivity conditional; and commercial disputes create partial reachability for days or years. This paper suggests \emph{persistent, partial reachability is fundamental to the Internet} and an underexplored problem. We first \emph{derive a conceptual definition of the Internet core} based on connectivity, not authority. We identify \emph{peninsulas}: persistent, partial connectivity; and \emph{islands}: when computers are partitioned from the Internet core. Second, we develop algorithms to observe each across the Internet, and apply them to two existing measurement systems: Trinocular, where 6 locations observe 5M networks frequently, and RIPE Atlas, where 13k locations scan the DNS roots frequently. Cross-validation shows our findings are stable over \emph{three years of data}, and consistent with as few as 3 geographically-distributed observers. We validate peninsulas and islands against CAIDA Ark, showing good recall (0.94) and bounding precision between 0.42 and 0.82. Finally, our work has broad practical impact: we show that \emph{peninsulas are more common than Internet outages}. Factoring out peninsulas and islands as noise can \emph{improve existing measurement systems}; their ``noise'' is $$5\times$$ to $$9.7\times$$ larger than the operational events in RIPE's DNSmon. We show that most peninsula events are routing transients (45\%), but most peninsula-time (90\%) is due to a few (7\%) long-lived events. Our work helps inform Internet policy and governance, with our neutral definition showing no single country or organization can unilaterally control the Internet core.
more »
« less
- PAR ID:
- 10661262
- Publisher / Repository:
- ACM New Ideas in Networked Systems (NINeS)
- Date Published:
- Subject(s) / Keyword(s):
- internet outages partial outages network partition islands peninsulas
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
DNS latency is a concern for many service operators: CDNs exist to reduce service latency to end-users but must rely on global DNS for reachability and load-balancing. Today, DNS latency is monitored by active probing from distributed platforms like RIPE Atlas, with Verfploeter, or with commercial services. While Atlas coverage is wide, its 10k sites see only a fraction of the Internet. In this paper we show that passive observation of TCP handshakes can measure \emph{live DNS latency, continuously, providing good coverage of current clients of the service}. Estimating RTT from TCP is an old idea, but its application to DNS has not previously been studied carefully. We show that there is sufficient TCP DNS traffic today to provide good operational coverage (particularly of IPv6), and very good temporal coverage (better than existing approaches), enabling near-real time evaluation of DNS latency from \emph{real clients}. We also show that DNS servers can optionally solicit TCP to broaden coverage. We quantify coverage and show that estimates of DNS latency from TCP is consistent with UDP latency. Our approach finds previously unknown, real problems: \emph{DNS polarization} is a new problem where a hypergiant sends global traffic to one anycast site rather than taking advantage of the global anycast deployment. Correcting polarization in Google DNS cut its latency from 100ms to 10ms; and from Microsoft Azure cut latency from 90ms to 20ms. We also show other instances of routing problems that add 100--200ms latency. Finally, \emph{real-time} use of our approach for a European country-level domain has helped detect and correct a BGP routing misconfiguration that detoured European traffic to Australia. We have integrated our approach into several open source tools: Entrada, our open source data warehouse for DNS, a monitoring tool (ANTS), which has been operational for the last 2 years on a country-level top-level domain, and a DNS anonymization tool in use at a root server since March 2021.more » « less
-
\emph{Address dynamics} are changes in IP address occupation as users come and go, ISPs renumber them for privacy or for routing maintenance. Address dynamics affect address reputation services, IP geolocation, network measurement, and outage detection, with implications of Internet governance, e-commerce, and science. While prior work has identified diurnal trends in address use, we show the effectiveness of Multi-Seasonal-Trend using Loess decomposition to identify both daily and weekly trends. We use ISP-wide dynamics to develop IAS, a new algorithm that is the first to automatically detect ISP maintenance events that move users in the address space. We show that 20\% of such events result in /24 IPv4 address blocks that become unused for days or more, and correcting nearly 41k false outages per quarter. Our analysis provides a new understanding about ISP address use: while only about 2.8\% of ASes (1,730) are diurnal, some diurnal ASes show more than 20\% changes each day. It also shows greater fragmentation in IPv4 address use compared to IPv6.more » « less
-
Learning network topology from partial knowledge of its connectivity is an important objective in practical scenarios of communication networks and social-media networks. Representing such networks as connected graphs, exploring and recovering connectivity information between network nodes can help visualize the network topology and improve network utility. This work considers the use of simple hop distance measurement obtained from a fraction of anchor/source nodes to reconstruct the node connectivity relationship for large scale networks of unknown connection topology. Our proposed approach consists of two steps. We first develop a tree-based search strategy to determine constraints on unknown network edges based on the hop count measurements. We then derive the logical distance between nodes based on principal component analysis (PCA) of the measurement matrix and propose a binary hypothesis test for each unknown edge. The proposed algorithm can effectively improve both the accuracy of connectivity detection and the successful delivery rate in data routing applications.more » « less
-
The Internet is composed of many interconnected, interoperating networks. With the recent advances in Future Internet design, multiple new network architectures, especially Information-Centric Networks (ICN) have emerged. Given the ubiquity of networks based on the Internet Protocol (IP), it is likely that we will have a number of different interconnecting network domains with different architectures, including ICNs. Their interoperability is important, but at the same time difficult to prove. A formal tool can be helpful for such analysis. ICNs have a number of unique characteristics, warranting formal analysis, establishing properties that go beyond, and are different from, what have been used in the state-of-the-art because ICN operates at the level of content names rather than node addresses. We need to focus on node-to-content reachability, rather than node-to-node reachability. In this paper, we present a formal approach to model and analyze information-centric interoperability (ICI). We use Alloy Analyzer’s model finding approach to verify properties expressed as invariants for information-centric services (both pull and push-based models) including content reachability and returnability. We extend our use of Alloy to model counting, to quantitatively analyze failure and mobility properties. We present a formally-verified ICI framework that allows for seamless interoperation among a multitude of network architectures. We also report on the impact of domain types, routing policies, and binding techniques on the probability of content reachability and returnability, under failures and mobility.more » « less
An official website of the United States government

