Title: Scaling beyond packet switch limits with multiple dataplanes
Scale-out datacenter network fabrics enable network operators to translate improved link and switch speeds directly into end-host throughput. Unfortunately, limits in the underlying CMOS packet switch chip manufacturing roadmap mean that NICs, links, and switches are not getting faster fast enough to meet demand. As a result, operators have introduced alternative, parallel fabric designs in the core of the network that deliver N-times the bandwidth by simply forwarding traffic over any of N parallel network fabrics. In this work, we consider extending this parallel network idea all the way to the end host. Our initial impressions found that direct application of existing path selection and forwarding techniques resulted in poor performance. Instead, we show that appropriate path selection and forwarding protocols can not only improve the performance of existing, homogeneous parallel fabrics, but enable the development of heterogeneous parallel network fabrics that can deliver even higher bandwidth, lower latency, and improved resiliency than traditional designs constructed from the same constituent components. more »« less
Alzaid, Zaid; Bhowmik, Saptarshi; Yuan, Xin
(, IPDPS workshop on Scalable Networks for Advanced Computer Systems)
null
(Ed.)
The Jellyfish network has recently been proposed as an alternative to the fat-tree network for data centers and high-performance computing clusters. Jellyfish uses a random regular graph as its switch-level topology and has shown to be more cost-effective than fat-trees. Effective routing on Jellyfish is challenging. It is known that shortest path routing and equal cost multi-path routing (ECMP) do not work well on Jellyfish. Existing schemes use variations of k-shortest path routing (KSP). In this work, we study two routing components for Jellyfish: path selection that decides the paths to route traffic, and routing mechanisms that decide which path to be used for each packet. We show that the performance of the existing KSP can be significantly improved by incorporating two heuristics, randomization and edge-disjointness. We evaluate a range of routing mechanisms, including traffic oblivious and traffic adaptive schemes, and identify an adaptive routing scheme with noticeably higher performance than others.
Subramanian, Kausik; D'Antoni, Loris; Akella, Aditya
(, POPL 2017 Proceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages)
Operators in multi-tenant cloud datacenters require support for diverse and complex end-to-end policies, such as, reachability, middlebox traversals, isolation, traffic engineering, and network resource management. We present Genesis, a datacenter network management system which allows policies to be specified in a declarative manner without explicitly programming the network data plane. Genesis tackles the problem of enforcing policies by synthesizing switch forwarding tables. It uses the formal foundations of constraint solving in combination with fast off-the-shelf SMT solvers. To improve synthesis performance, Genesis incorporates a novel search strategy that uses regular expressions to specify properties that leverage the structure of datacenter networks, and a divide-and-conquer synthesis procedure which exploits the structure of policy relationships. We have prototyped Genesis, and conducted experiments with a variety of workloads on real-world topologies to demonstrate its performance.
Panda, Sourav; Ramakrishnan, K. K.; Bhuyan, Laxmi N.
(, 2022 IEEE 30th International Conference on Network Protocols (ICNP))
The 5G user plane function (UPF) is a critical inter-connection point between the data network and cellular network infrastructure. It governs the packet processing performance of the 5G core network. UPFs also need to be flexible to support several key control plane operations. Existing UPFs typically run on general-purpose CPUs, but have limited performance because of the overheads of host-based forwarding. We design Synergy, a novel 5G UPF running on SmartNICs that provides high throughput and low latency. It also supports monitoring functionality to gather critical data on user sessions for the prediction and optimization of handovers during user mobility. The SmartNIC UPF efficiently buffers data packets during handover and paging events by using a two-level flow-state access mechanism. This enables maintaining flow-state for a very large number of flows, thus providing very low latency for control and data planes and high throughput packet forwarding. Mobility prediction can reduce the handover delay by pre-populating state in the UPF and other core NFs. Synergy performs handover predictions based on an existing recurrent neural network model. Synergy's mobility predictor helps us achieve 2.32× lower average handover latency. Buffering in the SmartNIC, rather than the host, during paging and handover events reduces packet loss rate by at least 2.04×. Compared to previous approaches to building programmable switch-based UPFs, Synergy speeds up control plane operations such as handovers because of the low P4-programming latency leveraging tight coupling between SmartNIC and host.
Bezerra, J.; Arcanjo, V.; Ibarra, J.; Kantor, J.; Lambert, R.; Kollross, M.; Astudillo, A.; Sobhani, S.; Jaque, S.; Petravick, D.; et al
(, Astronomical Data Analysis Software and Systems (ADASS XXVII) conference)
New international academic collaborations are being created at a fast pace, generating data sets each day, in the order of terabytes in size. Often these data sets need to be moved in real-time to a central location to be processed and then shared. In the field of astronomy, building data processing facilities in remote locations is not always feasible, creating the need for a high bandwidth network infrastructure to transport these data sets very long distances. This network infrastructure normally relies on multiple networks operated by multiple organizations or projects. Creating an end-to-end path involving multiple network operators, technologies and interconnections often adds conditions that make the real-time movement of big data sets challenging. The Large Synoptic Survey Telescope (LSST) is an example of astronomical applications imposing new challenges on multi-domain network provisioning activities. The network for LSST is challenging for a number of reasons: (1) with the telescope in Chile and the archiving facility in the USA, the network has a high propagation delay, which affects traditional transport protocols performance; (2) the path is composed of multiple network operators, which means that the different network operating teams involved must coordinate technologies and protocols to support all parallel data transfers in an efficient way; (3) the large amount of data produced (12.7GB/image) and the small interval available to transfer this data (5 seconds) to the archiving facility requires special Quality of Service (QoS) policies; (4) because network events happen, the network needs to be prepared to be adjusted for rainy days, where some data types will be prioritized over others. To guarantee data transfers will happen within the required interval, each network operator in the path needs to apply QoS policies to each of its network links. These policies need to be coordinated end-to-end and, in the case where the network is affected by parallel events, all policies might need to be dynamically reconfigured in real-time to accommodate specific QoS policies for rainy days. Reconfiguring QoS policies is a very complex activity to current network protocols and technologies, sometimes requiring human intervention. This presentation aims to share the efforts to guarantee an efficient network configuration capable of handling LSST data transfers in sunny and rainy days across multiple network operators from South to North America.
Lei, Yunsen; Lanson, Julian P.; Kaldawy, Remy M.; Estrada, Jeffrey; Shue, Craig A.
(, IEEE Network of the Future (NoF))
The software-defined networking (SDN) paradigm offers significant flexibility for network operators. However, the SDN community has focused on switch-based implementations, which pose several challenges. First, some may require significant hardware costs to upgrade a network. Further, fine-grained flow control in a switch-based SDN results in well-known, fundamental scalability limitations. These challenges may limit the reach of SDN technologies. In this work, we explore the extent to which host-based SDN agents can achieve feature parity with switch-based SDNs. Prior work has shown the potential of host-based SDNs for security and access control. Our study finds that with appropriate preparation, a host-based agent offers the same capabilities of switch-based SDNs in the remaining key area of traffic engineering, even in a legacy managed-switch network. We find the approach offers comparable performance to switch-based SDNs while eliminating the flow table scalability and cost concerns of switch-based SDN deployments.
Guo, Yibo, Mellette, William M., Snoeren, Alex C., and Porter, George. Scaling beyond packet switch limits with multiple dataplanes. Retrieved from https://par.nsf.gov/biblio/10458427. CoNEXT '22: Proceedings of the 18th International Conference on emerging Networking EXperiments and Technologies . Web. doi:10.1145/3555050.3569141.
Guo, Yibo, Mellette, William M., Snoeren, Alex C., & Porter, George. Scaling beyond packet switch limits with multiple dataplanes. CoNEXT '22: Proceedings of the 18th International Conference on emerging Networking EXperiments and Technologies, (). Retrieved from https://par.nsf.gov/biblio/10458427. https://doi.org/10.1145/3555050.3569141
Guo, Yibo, Mellette, William M., Snoeren, Alex C., and Porter, George.
"Scaling beyond packet switch limits with multiple dataplanes". CoNEXT '22: Proceedings of the 18th International Conference on emerging Networking EXperiments and Technologies (). Country unknown/Code not available. https://doi.org/10.1145/3555050.3569141.https://par.nsf.gov/biblio/10458427.
@article{osti_10458427,
place = {Country unknown/Code not available},
title = {Scaling beyond packet switch limits with multiple dataplanes},
url = {https://par.nsf.gov/biblio/10458427},
DOI = {10.1145/3555050.3569141},
abstractNote = {Scale-out datacenter network fabrics enable network operators to translate improved link and switch speeds directly into end-host throughput. Unfortunately, limits in the underlying CMOS packet switch chip manufacturing roadmap mean that NICs, links, and switches are not getting faster fast enough to meet demand. As a result, operators have introduced alternative, parallel fabric designs in the core of the network that deliver N-times the bandwidth by simply forwarding traffic over any of N parallel network fabrics. In this work, we consider extending this parallel network idea all the way to the end host. Our initial impressions found that direct application of existing path selection and forwarding techniques resulted in poor performance. Instead, we show that appropriate path selection and forwarding protocols can not only improve the performance of existing, homogeneous parallel fabrics, but enable the development of heterogeneous parallel network fabrics that can deliver even higher bandwidth, lower latency, and improved resiliency than traditional designs constructed from the same constituent components.},
journal = {CoNEXT '22: Proceedings of the 18th International Conference on emerging Networking EXperiments and Technologies},
author = {Guo, Yibo and Mellette, William M. and Snoeren, Alex C. and Porter, George},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.