The industry standard Packet CAPture (PCAP) format for storing network packet traces is normally only readable in serial due to its lack of delimiters, indexing, or blocking. This presents a challenge for parallel analysis of large networks, where packet traces can be many gigabytes in size. In this work we present RAPCAP, a novel method for random access into variable-length record collections like PCAP by identifying a record boundary within a small number of bytes of the access point. Unlike related heuristic methods that can limit scalability with a nonzero probability of error, the new method offers a correctness guarantee with a well formed file and does not rely on prior knowledge of the contents. We include a practical implementation of the algorithm with an extension to the Hadoop framework, and a performance comparison to serial ingestion. Finally, we present a number of similar storage types that could utilize a modified version of RAPCAP for random access.
more »
« less
Random Access in Nondelimited Variable-length Record Collections for Parallel Reading with Hadoop
The industry standard Packet CAPture (PCAP) format for storing network packet traces is normally only readable in serial due to its lack of delimiters, indexing, or blocking. This presents a challenge for parallel analysis of large networks, where packet traces can be many gigabytes in size. In this work we present RAPCAP, a novel method for random access into variable-length record collections like PCAP by identifying a record boundary within a small number of bytes of the access point. Unlike related heuristic methods that can limit scalability with a nonzero probability of error, the new method offers a correctness guarantee with a well formed file and does not rely on prior knowledge of the contents. We include a practical implementation of the algorithm with an extension to the Hadoop framework, and a performance comparison to serial ingestion. Finally, we present a number of similar storage types that could utilize a modified version of RAPCAP for random access.
more »
« less
- Award ID(s):
- 1642542
- NSF-PAR ID:
- 10028362
- Date Published:
- Journal Name:
- The 2nd IFIP/IEEE International Workshop on Analytics for Network and Service Management (AnNet 2017)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Understanding end-user video Quality of Experience (QoE) is important for Internet Service Providers (ISPs). Existing work presents mechanisms that use network measurement data to estimate video QoE. Most of these mechanisms assume access to packet-level traces, the most-detailed data available from the network. However, collecting packet-level traces can be challenging at a network-wide scale. Therefore, we ask: "Is it feasible to estimate video QoE with lightweight, readily-available, but coarse-grained network data?" We specifically consider data in the form of Transport Layer Security (TLS) transactions that can be collected using a standard proxy and present a machine learning-based methodology to estimate QoE. Our evaluation with three popular streaming services shows that the estimation accuracy using TLS transactions is high (up to 72%) with up to 85% recall in detecting low QoE (low video quality or high re-buffering) instances. Compared to packet traces, the estimation accuracy (recall) is 7% (9%) lower but has up to 60 times lower computation overhead.more » « less
-
Polyanskiy [1] proposed a framework for the MAC problem with a large number of users, where users employ a common codebook in the finite blocklength regime. In this work, we extend [1] to the case when the number of active users is random and there is also a delay constraint. We first define a random-access channel and derive the general converse bound. Our bound captures the basic tradeoff between the required energy and the delay constraint. Then we propose an achievable bound for block transmission. In this case, all packets are transmitted in the second half of the block to avoid interference. We then study treating interference as noise (TIN) with both single user and multiple users. Last, we derive an achievable bound for the packet splitting model, which allows users to split each packet into two parts with different blocklengths. Our numerical results indicate that, when the delay is large, TIN is effective; on the other hand, packet splitting outperforms as the delay decreases.more » « less
-
The Tor anonymity system is vulnerable to website fingerprinting attacks that can reveal users Internet browsing behavior. The state-of-the-art website fingerprinting attacks use convolutional neural networks to automatically extract features from packet traces. One such attack undermines an efficient fingerprinting defense previously considered a candidate for implementation in Tor. In this work, we study the use of neural network attribution techniques to visualize activity in the attack's model. These visualizations, essentially heatmaps of the network, can be used to identify regions of particular sensitivity and provide insight into the features that the model has learned. We then examine how these heatmaps may be used to create a new website fingerprinting defense that applies random padding to the website trace with an emphasis towards highly fingerprintable regions. This defense reduces the attacker's accuracy from 98% to below 70% with a packet overhead of approximately 80%.more » « less
-
In this paper, we propose a multi-band medium access control (MAC) protocol for an infrastructure-based network with an access point (AP) that supports In-Band full-duplex (IBFD) and multiuser transmission to multi-band-enabled stations. The Multi-Band Full Duplex MAC (MB-FDMAC) protocol mainly uses the sub-6 GHz band for control-frame exchange, transmitted at the lowest rate per IEEE 802.11 standards, and uses the 60 GHz band, which has significantly higher instantaneous bandwidth, exclusively for data-frame exchange. We also propose a selection method that ensures fairness among uplink and downlink stations. Our result shows that MB-FDMAC effectively improves the spectral efficiency in the mmWave band by 324%, 234%, and 189% compared with state-of-the-art MAC protocols. In addition, MB-FDMAC significantly outperforms the combined throughput of sub-6 GHz and 60 GHz IBFD multiuser MIMO networks that operate independently by more than 85%. In addition, we study multiple network variables such as the number of stations in the network, the percentage of mmWave band stations, the size of the contention stage, and the selection method on MB-FDMAC by evaluating the change in the throughput, packet delay, and fairness among stations. Finally, we propose a method to improve the utilization of the high bandwidth of the mmWave band by incorporating time duplexing into MB-FDMAC, which we show can enhance the fairness by 12.5%and significantly reduces packet delay by 80%.more » « less