skip to main content


Title: Revisiting TCP congestion control throughput models & fairness properties at scale
Much of our understanding of congestion control algorithm (CCA) throughput and fairness is derived from models and measurements that (implicitly) assume congestion occurs in the last mile. That is, these studies evaluated CCAs in “small scale” edge settings at the scale of tens of flows and up to a few hundred Mbps bandwidths. However, recent measurements show that congestion can also occur at the core of the Internet on inter-provider links, where thousands of flows share high bandwidth links. Hence, a natural question is: Does our understanding of CCA throughput and fairness continue to hold at the scale found in the core of the Internet, with 1000s of flows and Gbps bandwidths? Our preliminary experimental study finds that some expectations derived in the edge setting do not hold at scale. For example, using loss rate as a parameter to the Mathis model to estimate TCP NewReno throughput works well in edge settings, but does not provide accurate throughput estimates when thousands of flows compete at high bandwidths. In addition, BBR – which achieves good fairness at the edge when competing solely with other BBR flows – can become very unfair to other BBR flows at the scale of the core of the Internet. In this paper, we discuss these results and others, as well as key implications for future CCA analysis and evaluation.  more » « less
Award ID(s):
1850384
NSF-PAR ID:
10311806
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
ACM Internet Measurement Conference
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. BBR is a new congestion control algorithm (CCA) deployed for Chromium QUIC and the Linux kernel. As the default CCA for YouTube (which commands 11+% of Internet traffic), BBR has rapidly become a major player in Internet congestion control. BBR’s fairness or friendliness to other connections has recently come under scrutiny as measurements from multiple research groups have shown undesirable outcomes when BBR competes with traditional CCAs. One such outcome is a fixed, 40% proportion of link capacity consumed by a single BBR flow when competing with as many as 16 loss-based algorithms like Cubic or Reno. In this short paper, we provide the first model capturing BBR’s behavior in competition with loss-based CCAs. Our model is coupled with practical experiments to validate its implications. The key lesson is this: under competition, BBR becomes window-limited by its ‘in-flight cap’ which then determines BBR’s bandwidth consumption. By modeling the value of BBR’s in-flight cap under varying network conditions, we can predict BBR’s throughput when competing against Cubic flows with a median error of 5%, and against Reno with a median of 8%. 
    more » « less
  2. null (Ed.)
    Cellular networks are becoming ever more sophisticated and over-crowded, imposing the most delay, jitter, and throughput damage to end-to-end network flows in today’s internet. We therefore ar- gue for fine-grained mobile endpoint-based wireless measurements to inform a precise congestion control algorithm through a well- defined API to the mobile’s cellular physical layer. Our proposed congestion control algorithm is based on Physical-Layer Bandwidth measurements taken at the Endpoint (PBE-CC), and captures the latest 5G New Radio innovations that increase wireless capacity, yet create abrupt rises and falls in available wireless capacity that the PBE-CC sender can react to precisely and rapidly. We imple- ment a proof-of-concept prototype of the PBE measurement module on software-defined radios and the PBE sender and receiver in C. An extensive performance evaluation compares PBE-CC head to head against the cellular-aware and wireless-oblivious congestion control protocols proposed in the research community and in deployment, in mobile and static mobile scenarios, and over busy and idle networks. Results show 6.3% higher average throughput than BBR, while simultaneously reducing 95th percentile delay by 1.8×. 
    more » « less
  3. Congestion Control Algorithms (CCAs) impact numerous desirable Internet properties such as performance, stability, and fairness. Hence, the networking community invests substantial effort into studying whether new algorithms are safe for wide-scale deployment. However, operators today are continuously innovating and some deployed CCAs are unpublished - either because the CCA is in beta or because it is considered proprietary. How can the networking community evaluate these new CCAs when their inner workings are unknown? In this paper, we propose 'counterfeit congestion control algorithms' - reverse-engineered implementations derived using program synthesis based on observations of the original implementation. Using the counterfeit (synthesized) CCA implementation, researchers can then evaluate the CCA using controlled empirical testbeds or mathematical analysis, even without access to the original implementation. Our initial prototype, 'Mister 880,' can synthesize several basic CCAs including a simplified Reno using only a few traces. 
    more » « less
  4. BBR is a newer TCP congestion control algorithm with promising features, but it can often be unfair to existing loss-based congestion-control algorithms. This is because BBR's sending rate is dictated by static parameters that do not adapt well to dynamic and diverse network conditions. In this work, we introduce BBR-ML, an in-kernel ML-based tuning system for BBR, designed to improve fairness when in competition with loss-based congestion control. To build BBR-ML, we discretized the network condition search space and trained a model on 2,500 different network conditions. We then modified BBR to run an in-kernel model to predict network buffer sizes, and then use this prediction for optimal parameter settings. Our preliminary evaluation results show that BBR-ML can improve fairness when in competition with Cubic by up to 30% in some cases. 
    more » « less
  5. Modern end-host network stacks have to handle traffic from tens of thousands of flows and hundreds of virtual machines per single host, to keep up with the scale of modern clouds. This can cause congestion for traffic egressing from the end host. The effects of this congestion have received little attention. Currently, an overflowing queue, like a kernel queuing discipline, will drop incoming packets. Packet drops lead to worse network and CPU performance by inflating the time to transmit the packet as well as spending extra effort on retransmissions. In this paper, we show that current end-host mechanisms can lead to high CPU utilization, high tail latency, and low throughput in cases of congestion of egress traffic within the end host. We present zD, a framework for applying backpressure from a congested queue to traffic sources at end hosts that can scale to thousands of flows. We implement zD to apply backpressure in two settings: i) between TCP sources and kernel queuing discipline, and ii) between VMs as traffic sources and kernel queuing discipline in the hypervisor. zD improves throughput by up to 60%, and improves tail RTT by at least 10x at high loads, compared to standard kernel implementation. 
    more » « less