skip to main content

Title: A Performance Evaluation of TCP BBRv2 Alpha
The alpha version of Bottleneck Bandwidth and Round-trip Time version 2 (BBRv2) has been recently presented, which aims to mitigate the shortcomings of its predecessor, BBR version 1 (BBRv1). Previous studies show that BBRv1 provides a high link utilization and low queuing delay by estimating the available bottleneck bandwidth. However, its aggressiveness induces unfairness when flows i) use different congestion control algorithms, such as CUBIC, and ii) have distinct round-trip times (RTTs). This paper presents an experimental evaluation of BBRv2, using Mininet. Results show that the coexistence between BBRv2-CUBIC is enhanced with respect to that of BBRv1-CUBIC, as measured by the fairness index. They also show that BBRv2 mitigates the RTT unfairness problem observed in BBRv1. Additionally, BBRv2 achieves a better fair share of the bandwidth than its predecessor when network conditions such as bandwidth and latency dynamically change. Results also indicate that the average flow completion time of concurrent flows is reduced when BBRv2 is used.
Authors:
; ; ; ;
Award ID(s):
1925484
Publication Date:
NSF-PAR ID:
10252944
Journal Name:
2020 43rd International Conference on Telecommunications and Signal Processing (TSP)
Page Range or eLocation-ID:
309 to 312
Sponsoring Org:
National Science Foundation
More Like this
  1. Google published the first release of the Bottleneck Bandwidth and Round-trip Time (BBR) congestion control algorithm in 2016. Since then, BBR has gained a widespread attention due to its ability to operate efficiently in the presence of packet loss and in scenarios where routers are equipped with small buffers. These characteristics were not attainable with traditional loss-based congestion control algorithms such as CUBIC and Reno. BBRv2 is a recent congestion control algorithm proposed as an improvement to its predecessor, BBRv1. Preliminary work suggests that BBRv2 maintains the high throughput and the bounded queueing delay properties of BBRv1. However, the literature has been missing an evaluation of BBRv2 under different network conditions. This paper presents an experimental evaluation of BBRv2 Alpha (v2alpha-2019-07-28) on Mininet, considering alternative active queue management (AQM) algorithms, routers with different buffer sizes, variable packet loss rates and round-trip times (RTTs), and small and large numbers of TCP flows. Emulation results show that BBRv2 tolerates much higher random packet loss rates than loss-based algorithms but slightly lower than BBRv1. The results also confirm that BBRv2 has better coexistence with loss-based algorithms and lower retransmission rates than BBRv1, and that it produces low queuing delay even with large buffers.more »When a Tail Drop policy is used with large buffers, an unfair bandwidth allocation is observed among BBRv2 and CUBIC flows. Such unfairness can be reduced by using advanced AQM schemes such as FQ-CoDel and CAKE. Regarding fairness among BBRv2 flows, results show that using small buffers produces better fairness, without compromising high throughput and link utilization. This observation applies to BBRv1 flows as well, which suggests that rate-based model-based algorithms work better with small buffers. BBRv2 also enhances the coexistence of flows with different RTTs, mitigating the RTT unfairness problem noted in BBRv1. Lastly, the paper presents the advantages of using TCP pacing with a loss-based algorithm, when the rate is manually configured a priori. Future algorithms could set the pacing rate using explicit feedback generated by modern programmable switches.« less
  2. Cloud services are deployed in datacenters connected though high-bandwidth Wide Area Networks (WANs). We find that WAN traffic negatively impacts the performance of datacenter traffic, increasing tail latency by 2.5x, despite its small bandwidth demand. This behavior is caused by the long round-trip time (RTT) for WAN traffic, combined with limited buffering in datacenter switches. The long WAN RTT forces datacenter traffic to take the full burden of reacting to congestion. Furthermore, datacenter traffic changes on a faster time-scale than the WAN RTT, making it difficult for WAN congestion control to estimate available bandwidth accurately. We present Annulus, a congestion control scheme that relies on two control loops to address these challenges. One control loop leverages existing congestion control algorithms for bottlenecks where there is only one type of traffic (i.e., WAN or datacenter). The other loop handles bottlenecks shared between WAN and datacenter traffic near the traffic source, using direct feedback from the bottleneck. We implement Annulus on a testbed and in simulation. Compared to baselines using BBR for WAN congestion control and DCTCP or DCQCN for datacenter congestion control, Annulus increases bottleneck utilization by 10% and lowers datacenter flow completion time by 1.3-3.5x.
  3. Wi-Fi is one of the key wireless technologies for the Internet of things (IoT) owing to its ubiquity. Low-power operation of commercial Wi-Fi enabled IoT modules (typically powered by replaceable batteries) is critical in order to achieve a long battery life, while maintaining connectivity, and thereby reduce the cost and frequency of maintenance. In this work, we focus on commonly used sparse periodic uplink traffic scenario in IoT. Through extensive experiments with a state-of-the-art Wi-Fi enabled IoT module (Texas Instruments SimpleLink CC3235SF), we study the performance of the power save mechanism (PSM) in the IEEE 802.11 standard and show that the battery life of the module is limited, while running thin uplink traffic, to ~30% of its battery life on an idle connection, even when utilizing IEEE 802.11 PSM. Focusing on sparse uplink traffic, a prominent traffic scenario for IoT (e.g., periodic measurements, keep-alive mechanisms, etc.), we design a simulation framework for single-user sparse uplink traffic on ns-3, and develop a detailed and platform-agnostic accurate power consumption model within the framework and calibrate it to CC3235SF. Subsequently, we present five potential power optimization strategies (including standard IEEE 802.11 PSM) and analyze, with simulation results, the sensitivity of power consumption tomore »specific network characteristics (e.g., round-trip time (RTT) and relative timing between TCP segment transmissions and beacon receptions) to present key insights. Finally, we propose a standard-compliant client-side cross-layer power saving optimization algorithm that can be implemented on client IoT modules. We show that the proposed optimization algorithm extends battery life by 24%, 26%, and 31% on average for sparse TCP uplink traffic with 5 TCP segments per second for networks with constant RTT values of 25 ms, 10 ms, and 5 ms, respectively.« less
  4. Motivated by the rise of quantum computers, existing public-key cryptosystems are expected to be replaced by post-quantum schemes in the next decade in billions of devices. To facilitate the transition, NIST is running a standardization process which is currently in its final Round. Only three digital signature schemes are left in the competition, among which Dilithium and Falcon are the ones based on lattices. Besides security and performance, significant attention has been given to resistance against implementation attacks that target side-channel leakage or fault injection response. Classical fault attacks on signature schemes make use of pairs of faulty and correct signatures to recover the secret key which only works on deterministic schemes. To counter such attacks, Dilithium offers a randomized version which makes each signature unique, even when signing identical messages. In this work, we introduce a novel Signature Correction Attack which not only applies to the deterministic version but also to the randomized version of Dilithium and is effective even on constant-time implementations using AVX2 instructions. The Signature Correction Attack exploits the mathematical structure of Dilithium to recover the secret key bits by using faulty signatures and the public-key. It can work for any fault mechanism which can inducemore »single bit-flips. For demonstration, we are using Rowhammer induced faults. Thus, our attack does not require any physical access or special privileges, and hence could be also implemented on shared cloud servers. Using Rowhammer attack, we inject bit flips into the secret key s1 of Dilithium, which results in incorrect signatures being generated by the signing algorithm. Since we can find the correct signature using our Signature Correction algorithm, we can use the difference between the correct and incorrect signatures to infer the location and value of the flipped bit without needing a correct and faulty pair. To quantify the reduction in the security level, we perform a thorough classical and quantum security analysis of Dilithium and successfully recover 1,851 bits out of 3,072 bits of secret key $s_{1}$ for security level 2. Fully recovered bits are used to reduce the dimension of the lattice whereas partially recovered coefficients are used to to reduce the norm of the secret key coefficients. Further analysis for both primal and dual attacks shows that the lattice strength against quantum attackers is reduced from 2128 to 281 while the strength against classical attackers is reduced from 2141 to 289. Hence, the Signature Correction Attack may be employed to achieve a practical attack on Dilithium (security level 2) as proposed in Round 3 of the NIST post-quantum standardization process.« less
  5. xisting network switches implement scheduling disciplines such as FIFO or deficit round robin that provide good utilization or fairness across flows, but do so at the expense of leaking a variety of information via timing side channels. To address this privacy breach, we propose a new scheduling mechanism for switches called indifferent-first scheduling (IFS). A salient aspect of IFS is that it provides privacy (a notion of strong isolation) to clients that opt-in, while preserving the (good) performance and utilization of FIFO or round robin for clients that are satisfied with the status quo. Such a hybrid scheduling mechanism addresses the main drawback of prior proposals such as time-division multiple access (TDMA) that provide strong isolation at the cost of low utilization and increased packet latency for all clients. We identify limitations of modern programmable switches which inhibit an implementation of IFS without compromising its privacy guarantees, and show that a version of IFS with full security can be implemented at line rate in the recently proposed push-in-first-out (PIFO) queuing architecture.