skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Friday, December 13 until 2:00 AM ET on Saturday, December 14 due to maintenance. We apologize for the inconvenience.


This content will become publicly available on March 6, 2025

Title: Statistical risk quantification of two-directional internet traffic flows

We develop statistical methodology for the quantification of risk of source-destination pairs in an internet network. The methodology is developed within the framework of functional data analysis and copula modeling. It is summarized in the form of computational algorithms that use bidirectional source-destination packet counts as input. The usefulness of our approach is evaluated by an application to real internet traffic flows and via a simulation study.

 
more » « less
Award ID(s):
2123761
PAR ID:
10521333
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
Polish Statistical Association
Date Published:
Journal Name:
Statistics in Transition new series
Volume:
25
Issue:
1
ISSN:
1234-7655
Page Range / eLocation ID:
1 to 22
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The Internet has never been more important to our society, and understanding the behavior of the Internet is essential. The Center for Applied Internet Data Analysis (CAIDA) Telescope observes a continuous stream of packets from an unsolicited darkspace representing 1/256 of the Internet. During 2019 and 2020 over 40,000,000,000,000 unique packets were collected representing the largest ever assembled public corpus of Internet traffic. Using the combined resources of the Supercomputing Centers at UC San Diego, Lawrence Berkeley National Laboratory, and MIT, the spatial temporal structure of anonymized source-destination pairs from the CAIDA Telescope data has been analyzed with GraphBLAS hierarchical hyper-sparse matrices. These analyses provide unique insight on this unsolicited Internet darkspace traffic with the discovery of many previously unseen scaling relations. The data show a significant sustained increase in unsolicited traffic corresponding to the start of the COVID19 pandemic, but relatively little change in the underlying scaling relations associated with unique sources, source fan-outs, unique links, destination fan-ins, and unique destinations. This work provides a demonstration of the practical feasibility and benefit of the safe collection and analysis of significant quantities of anonymized Internet traffic. 
    more » « less
  2. null (Ed.)
    Abstract Despite the prevalence of Internet of Things (IoT) devices, there is little information about the purpose and risks of the Internet traffic these devices generate, and consumers have limited options for controlling those risks. A key open question is whether one can mitigate these risks by automatically blocking some of the Internet connections from IoT devices, without rendering the devices inoperable. In this paper, we address this question by developing a rigorous methodology that relies on automated IoT-device experimentation to reveal which network connections (and the information they expose) are essential, and which are not. We further develop strategies to automatically classify network traffic destinations as either required ( i.e. , their traffic is essential for devices to work properly) or not, hence allowing firewall rules to block traffic sent to non-required destinations without breaking the functionality of the device. We find that indeed 16 among the 31 devices we tested have at least one blockable non-required destination, with the maximum number of blockable destinations for a device being 11. We further analyze the destination of network traffic and find that all third parties observed in our experiments are blockable, while first and support parties are neither uniformly required or non-required. Finally, we demonstrate the limitations of existing blocklists on IoT traffic, propose a set of guidelines for automatically limiting non-essential IoT traffic, and we develop a prototype system that implements these guidelines. 
    more » « less
  3. The Internet is transforming our society, necessitating a quantitative understanding of Internet traffic. Our team collects and curates the largest publicly available Internet traffic data containing 50 billion packets. Utilizing a novel hypersparse neural network analysis of “video” streams of this traffic using 10,000 processors in the MIT SuperCloud reveals a new phenomena: the importance of otherwise unseen leaf nodes and isolated links in Internet traffic. Our neural network approach further shows that a two-parameter modified Zipf-Mandelbrot distribution accurately describes a wide variety of source/destination statistics on moving sample windows ranging from 100,000 to 100,000,000 packets over collections that span years and continents. The inferred model parameters distinguish different network streams and the model leaf parameter strongly correlates with the fraction of the traffic in different underlying network topologies. The hypersparse neural network pipeline is highly adaptable and different network statistics and training models can be incorporated with simple changes to the image filter functions. 
    more » « less
  4. In this paper, we study a fresh data acquisition problem to acquire fresh data and optimize the age-related performance when strategic data sources have private market information. We consider an information update system in which a destination acquires, and pays for, fresh data updates from a source. The destination incurs an age-related cost, modeled as a general increasing function of the age-of-information (AoI). The source is strategic and incurs a sampling cost, which is its private information and may not be truthfully reported to the destination. To this end, we design an optimal (economic) mechanism for timely information acquisition by generalizing Myerson's seminal work. The goal is to minimize the sum of the destination's age-related cost and its payment to the source, while ensuring that the source truthfully reports its private information and will voluntarily participate in the mechanism. Our results show that, under some distributions of the source's cost, our proposed optimal mechanism can lead to an unbounded benefit, compared against a benchmark that naively trusts the source's report and thus incentivizes its maximal over-reporting. 
    more » « less
  5. A Mobile Ad-hoc Network (MANET) is a collection of nodes that communicate with each other wirelessly without any central support or conventional structure. The transmission of data packets over wireless channels in MANETs helps to maintain communication. Ad-hoc On-Demand Distance Vector Routing is a reactive routing protocol associated with MANET which creates a route to destination by broadcasting route request packets through the entire network. A link failure in this type of protocol causes the source to flood the network with these Route Request packets that leads to congestion in the network and performance degradation. This paper proposes an Efficient Multipath AODV routing algorithm that determines if a node in a network is relaying or is silent in the process of route discovery to send data packets from the source to destination. Simulation results show the proposed routing algorithm controls congestion and enhances performance in the network as not all network nodes have to participate in the route discovery for a particular source-destination pair. 
    more » « less