skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Spatial Temporal Analysis of 40,000,000,000,000 Internet Darkspace Packets
The Internet has never been more important to our society, and understanding the behavior of the Internet is essential. The Center for Applied Internet Data Analysis (CAIDA) Telescope observes a continuous stream of packets from an unsolicited darkspace representing 1/256 of the Internet. During 2019 and 2020 over 40,000,000,000,000 unique packets were collected representing the largest ever assembled public corpus of Internet traffic. Using the combined resources of the Supercomputing Centers at UC San Diego, Lawrence Berkeley National Laboratory, and MIT, the spatial temporal structure of anonymized source-destination pairs from the CAIDA Telescope data has been analyzed with GraphBLAS hierarchical hyper-sparse matrices. These analyses provide unique insight on this unsolicited Internet darkspace traffic with the discovery of many previously unseen scaling relations. The data show a significant sustained increase in unsolicited traffic corresponding to the start of the COVID19 pandemic, but relatively little change in the underlying scaling relations associated with unique sources, source fan-outs, unique links, destination fan-ins, and unique destinations. This work provides a demonstration of the practical feasibility and benefit of the safe collection and analysis of significant quantities of anonymized Internet traffic.  more » « less
Award ID(s):
1730661 1724853
PAR ID:
10318574
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; « less
Date Published:
Journal Name:
2021 IEEE High Performance Extreme Computing Conference (HPEC)
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The Internet has become a critical component of modern civilization requiring scientific exploration akin to endeavors to understand the land, sea, air, and space environments. Understanding the baseline statistical distributions of traffic are essential to the scientific understanding of the Internet. Correlating data from different Internet observatories and outposts can be a useful tool for gaining insights into these distributions. This work compares observed sources from the largest Internet telescope (the CAIDA darknet telescope) with those from a commercial outpost (the GreyNoise honeyfarm). Neither of these locations actively emit Internet traffic and provide distinct observations of unsolicited Internet traffic (primarily botnets and scanners). Newly developed GraphBLAS hyperspace matrices and D4M associative array technologies enable the efficient analysis of these data on significant scales. The CAIDA sources are well approximated by a Zipf-Mandelbrot distribution. Over a 6-month period 70% of the brightest (highest frequency) sources in the CAIDA telescope are consistently detected by coeval observations in the GreyNoise honeyfarm. This overlap drops as the sources dim (reduce frequency) and as the time difference between the observations grows. The probability of seeing a CAIDA source is proportional to the logarithm of the brightness. The temporal correlations are well described by a modified Cauchy distribution. These observations are consistent with a correlated high frequency beam of sources that drifts on a time scale of a month. 
    more » « less
  2. The Internet is transforming our society, necessitating a quantitative understanding of Internet traffic. Our team collects and curates the largest publicly available Internet traffic data containing 50 billion packets. Utilizing a novel hypersparse neural network analysis of “video” streams of this traffic using 10,000 processors in the MIT SuperCloud reveals a new phenomena: the importance of otherwise unseen leaf nodes and isolated links in Internet traffic. Our neural network approach further shows that a two-parameter modified Zipf-Mandelbrot distribution accurately describes a wide variety of source/destination statistics on moving sample windows ranging from 100,000 to 100,000,000 packets over collections that span years and continents. The inferred model parameters distinguish different network streams and the model leaf parameter strongly correlates with the fraction of the traffic in different underlying network topologies. The hypersparse neural network pipeline is highly adaptable and different network statistics and training models can be incorporated with simple changes to the image filter functions. 
    more » « less
  3. We develop statistical methodology for the quantification of risk of source-destination pairs in an internet network. The methodology is developed within the framework of functional data analysis and copula modeling. It is summarized in the form of computational algorithms that use bidirectional source-destination packet counts as input. The usefulness of our approach is evaluated by an application to real internet traffic flows and via a simulation study. 
    more » « less
  4. Network Telescopes, often referred to as darknets, capture unsolicited traffic directed toward advertised but unused IP spaces, enabling researchers and operators to monitor malicious, Internet-wide network phenomena such as vulnerability scanning, botnet propagation, and DoS backscatter. Detecting these events, however,has become increasingly challenging due to the growing traffic volumes that telescopes receive. To address this, we introduce DarkSim,a novel analytic framework that utilizes Dynamic Time Warping to measure similarities within the high-dimensional time series of network traffic. DarkSim combines traditional raw packet processing with statistical approaches, identifying traffic anomalies and enabling rapid time-to-insight. We evaluate our framework against DarkGLASSO, an existing method based on the GraphicalLASSO algorithm, using data from the UCSD Network Telescope.Based on our manually classified detections, DarkSim showcased perfect precision and an overlap of up to 91% of DarkGLASSO’s detections in contrast to DarkGLASSO’s maximum of 73.3% precision and detection overlap of 37.5% with the former. We further demonstrate DarkSim’s capability to detect two real-world events in our case studies: (1) an increase in scanning activities surrounding CVE public disclosures, and (2) shifts in country and network-level scanning patterns that indicate aggressive scanning. DarkSim provides a detailed and interpretable analysis framework for time-series anomalies, representing a new contribution to network security analytics. 
    more » « less
  5. A Mobile Ad-hoc Network (MANET) is a collection of nodes that communicate with each other wirelessly without any central support or conventional structure. The transmission of data packets over wireless channels in MANETs helps to maintain communication. Ad-hoc On-Demand Distance Vector Routing is a reactive routing protocol associated with MANET which creates a route to destination by broadcasting route request packets through the entire network. A link failure in this type of protocol causes the source to flood the network with these Route Request packets that leads to congestion in the network and performance degradation. This paper proposes an Efficient Multipath AODV routing algorithm that determines if a node in a network is relaying or is silent in the process of route discovery to send data packets from the source to destination. Simulation results show the proposed routing algorithm controls congestion and enhances performance in the network as not all network nodes have to participate in the route discovery for a particular source-destination pair. 
    more » « less