The industry standard Packet CAPture (PCAP) format for storing network packet traces is normally only readable in serial due to its lack of delimiters, indexing, or blocking. This presents a challenge for parallel analysis of large networks, where packet traces can be many gigabytes in size. In this work we present RAPCAP, a novel method for random access into variable-length record collections like PCAP by identifying a record boundary within a small number of bytes of the access point. Unlike related heuristic methods that can limit scalability with a nonzero probability of error, the new method offers a correctness guarantee with a well formed file and does not rely on prior knowledge of the contents. We include a practical implementation of the algorithm with an extension to the Hadoop framework, and a performance comparison to serial ingestion. Finally, we present a number of similar storage types that could utilize a modified version of RAPCAP for random access.
more »
« less
Random Access in Nondelimited Variable-length Record Collections for Parallel Reading with Hadoop
The industry standard Packet CAPture (PCAP) format for storing network packet traces is normally only readable in serial due to its lack of delimiters, indexing, or blocking. This presents a challenge for parallel analysis of large networks, where packet traces can be many gigabytes in size. In this work we present RAPCAP, a novel method for random access into variable-length record collections like PCAP by identifying a record boundary within a small number of bytes of the access point. Unlike related heuristic methods that can limit scalability with a nonzero probability of error, the new method offers a correctness guarantee with a well formed file and does not rely on prior knowledge of the contents. We include a practical implementation of the algorithm with an extension to the Hadoop framework, and a performance comparison to serial ingestion. Finally, we present a number of similar storage types that could utilize a modified version of RAPCAP for random access.
more »
« less
- Award ID(s):
- 1642542
- PAR ID:
- 10028361
- Date Published:
- Journal Name:
- 2nd IFIP/IEEE International Workshop on Analytics for Network and Service Management (AnNet‘17)
- Issue:
- 2
- Page Range / eLocation ID:
- 1-6
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Understanding end-user video Quality of Experience (QoE) is important for Internet Service Providers (ISPs). Existing work presents mechanisms that use network measurement data to estimate video QoE. Most of these mechanisms assume access to packet-level traces, the most-detailed data available from the network. However, collecting packet-level traces can be challenging at a network-wide scale. Therefore, we ask: "Is it feasible to estimate video QoE with lightweight, readily-available, but coarse-grained network data?" We specifically consider data in the form of Transport Layer Security (TLS) transactions that can be collected using a standard proxy and present a machine learning-based methodology to estimate QoE. Our evaluation with three popular streaming services shows that the estimation accuracy using TLS transactions is high (up to 72%) with up to 85% recall in detecting low QoE (low video quality or high re-buffering) instances. Compared to packet traces, the estimation accuracy (recall) is 7% (9%) lower but has up to 60 times lower computation overhead.more » « less
-
Polyanskiy [1] proposed a framework for the MAC problem with a large number of users, where users employ a common codebook in the finite blocklength regime. In this work, we extend [1] to the case when the number of active users is random and there is also a delay constraint. We first define a random-access channel and derive the general converse bound. Our bound captures the basic tradeoff between the required energy and the delay constraint. Then we propose an achievable bound for block transmission. In this case, all packets are transmitted in the second half of the block to avoid interference. We then study treating interference as noise (TIN) with both single user and multiple users. Last, we derive an achievable bound for the packet splitting model, which allows users to split each packet into two parts with different blocklengths. Our numerical results indicate that, when the delay is large, TIN is effective; on the other hand, packet splitting outperforms as the delay decreases.more » « less
-
Data from Internet telescopes that monitor routed but unused IP address space has been the basis for myriad insights on malicious, unwanted, and unexpected behavior. However, service migration to cloud infrastructure and the increasing scarcity of IPv4 address space present serious challenges to traditional Internet telescopes. This paper describes DSCOPE, a cloud-based Internet telescope designed to be scalable and interactive. We describe the design and implementation of DSCOPE, which includes two major components. Collectors are deployed on cloud VMs, interact with incoming connection requests, and capture pcap traces. The data processing pipeline organizes, transforms, and archives the pcaps from deployed collectors for post-facto analysis. In comparing a sampling of DSCOPE’s collected traffic with that of a traditional telescope, we see a striking difference in both the quantity and phenomena of behavior targeting cloud systems, with up to 450× as much cloud-targeting as expected under random scanning. We also show that DSCOPE’s adaptive approach achieves impressive price performance: optimal yield of scanners on a given IP address is achieved in under 8 minutes of observation. Our results demonstrate that cloud-based telescopes achieve a significantly broader and more comprehensive perspective than traditional techniques.more » « less
-
The Tor anonymity system is vulnerable to website fingerprinting attacks that can reveal users Internet browsing behavior. The state-of-the-art website fingerprinting attacks use convolutional neural networks to automatically extract features from packet traces. One such attack undermines an efficient fingerprinting defense previously considered a candidate for implementation in Tor. In this work, we study the use of neural network attribution techniques to visualize activity in the attack's model. These visualizations, essentially heatmaps of the network, can be used to identify regions of particular sensitivity and provide insight into the features that the model has learned. We then examine how these heatmaps may be used to create a new website fingerprinting defense that applies random padding to the website trace with an emphasis towards highly fingerprintable regions. This defense reduces the attacker's accuracy from 98% to below 70% with a packet overhead of approximately 80%.more » « less
An official website of the United States government

