NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

EDM: An Ultra-Low Latency Ethernet Fabric for Memory Disaggregation

https://doi.org/10.1145/3669940.3707221

Su, Weigao; Shrivastav, Vishal (February 2025, ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS))

Achieving low remote memory access latency remains the primary challenge in realizing memory disaggregation over Ethernet within the datacenters. We present EDM that attempts to overcome this challenge using two key ideas. First, while existing network protocols for remote memory access over the Ethernet, such as TCP/IP and RDMA, are implemented on top of the Ethernet MAC layer, EDM takes a radical approach by implementing the entire network protocol stack for remote memory access within the Physical layer (PHY) of the Ethernet. This overcomes fundamental latency and bandwidth overheads imposed by the MAC layer, especially for small memory messages. Second, EDM implements a centralized, fast, in-network scheduler for memory traffic within the PHY of the Ethernet switch. Inspired by the classic Parallel Iterative Matching (PIM) algorithm, the scheduler dynamically reserves bandwidth between compute and memory nodes by creating virtual circuits in the PHY, thus eliminating queuing delay and layer 2 packet processing delay at the switch for memory traffic, while maintaining high bandwidth utilization. Our FPGA testbed demonstrates that EDM's network fabric incurs a latency of only ~300 ns for remote memory access in an unloaded network, which is an order of magnitude lower than state-of-the-art Ethernet-based solutions such as RoCEv2 and comparable to emerging PCIe-based solutions such as CXL. Larger-scale network simulations indicate that even at high network loads, EDM's average latency remains within 1.3x its unloaded latency.
more » « less
Free, publicly-accessible full text available February 3, 2026
Semi-Oblivious Reconfigurable Datacenter Networks

https://doi.org/10.1145/3696348.3696860

Saran, Nitika; Amir, Daniel; Wilson, Tegan; Kleinberg, Robert; Shrivastav, Vishal; Weatherspoon, Hakim (November 2024, ACM)

Reconfigurable datacenter networks use fast optical circuit switches to provide high bandwidths at low cost, therefore emerging as a compelling alternative to packet switching. These switches offer micro- and nano-second reconfiguration, and reacting to demand at this time scale is infeasible. Proposed designs have therefore largely been oblivious, supporting arbitrary traffic patterns. However, this imposes a fundamental latency-throughput tradeoff that significantly limits the benefits of these switches. In this paper, we illustrate the feasibility of semi-oblivious reconfigurable datacenter networks that periodically adapt to large-scale structural patterns in traffic. We argue that such patterns are predictable in modern datacenters, that optimizing for them can provide latency-throughput scaling superior to oblivious designs, and that existing fast circuit-switched technologies support coarse-grained flexibility to adapt to these patterns.
more » « less
Free, publicly-accessible full text available November 18, 2025
Shale: A Practical, Scalable Oblivious Reconfigurable Network

Amir, Daniel; Saran, Nitika; Wilson, Tegan; Kleinberg, Robert; Shrivastav, Vishal; Weatherspoon, Hakim (August 2024, ACM Special Interest Group on Data Communication (SIGCOMM))

Circuit-switched technologies have long been proposed for handling high-throughput traffic in datacenter networks, but recent developments in nanosecond-scale reconfiguration have created the enticing possibility of handling low-latency traffic as well. The novel Oblivious Reconfigurable Network (ORN) design paradigm promises to deliver on this possibility. Prior work in ORN designs achieved latencies that scale linearly with system size, making them unsuitable for large-scale deployments. Recent theoretical work showed that ORNs can achieve far better latency scaling, proposing theoretical ORN designs that are Pareto optimal in latency and throughput. In this work, we bridge multiple gaps between theory and practice to develop Shale, the first ORN capable of providing low-latency networking at datacenter scale while still guaranteeing high throughput. By interleaving multiple Pareto optimal schedules in parallel, both latency- and throughput-sensitive flows can achieve optimal performance. To achieve the theoretical low latencies in practice, we design a new congestion control mechanism which is best suited to the characteristics of Shale. In datacenter-scale packet simulations, our design compares favorably with both an in-network congestion mitigation strategy, modern receiver-driven protocols such as NDP, and an idealized analog for sender-driven protocols. We implement an FPGA-based prototype of Shale, achieving orders of magnitude better resource scaling than existing ORN proposals. Finally, we extend our congestion control solution to handle node and link failures.
more » « less
Shale: A Practical, Scalable Oblivious Reconfigurable Network

https://doi.org/10.1145/3651890.3672248

Amir, Daniel; Saran, Nitika; Wilson, Tegan; Kleinberg, Robert; Shrivastav, Vishal; Weatherspoon, Hakim (August 2024, ACM)

Full Text Available
Seer: Enabling Future-Aware Online Caching in Networked Systems

Lei, Jason; Shrivastav, Vishal (April 2024, USENIX Symposium on Networked Systems Design and Implementation (NSDI))

State-intensive network and distributed applications rely heavily on online caching heuristics for high performance. However, there remains a fundamental performance gap between online caching heuristics and the optimal offline caching algorithm due to the lack of visibility into future state access requests in an online setting. Driven by the observation that state access requests in network and distributed applications are often carried in incoming network packets, we present Seer, an online caching solution for networked systems, that exploits the delays experienced by a packet inside a network - most prominently, transmission and queuing delays - to notify in advance of future packet arrivals to the target network nodes (switches/routers/middleboxes/end-hosts) implementing caching. Using this as a building block, Seer presents the design of an online cache manager that leverages visibility into (partial) set of future state access requests to make smarter prefetching and cache eviction decisions. Our evaluations show that Seer achieves up to 65% lower cache miss ratio and up to 78% lower flow completion time compared to LRU for key network applications over realistic workloads.
more » « less
Full Text Available
Breaking the VLB Barrier for Oblivious Reconfigurable Networks

https://doi.org/10.1145/3618260.3649608

Wilson, Tegan; Amir, Daniel; Saran, Nitika; Kleinberg, Robert; Shrivastav, Vishal; Weatherspoon, Hakim (June 2024, Conference proceedings of the annual ACM Symposium on Theory of Computing)

In a landmark 1981 paper, Valiant and Brebner gave birth to the study of oblivious routing and, simultaneously, introduced its most powerful and ubiquitous method: Valiant load balancing (VLB). By routing messages through a randomly sampled intermediate node, VLB lengthens routing paths by a factor of two but gains the crucial property of obliviousness: it balances load in a completely decentralized manner, with no global knowledge of the communication pattern. Forty years later, with datacenters handling workloads whose communication pattern varies too rapidly to allow centralized coordination, oblivious routing is as relevant as ever, and VLB continues to take center stage as a widely used — and in some settings, provably optimal — way to balance load in the network obliviously to the traffic demands. However, the ability of the network to rapidly reconfigure its interconnection topology gives rise to new possibilities. In this work we revisit the question of whether VLB remains optimal in the novel setting of reconfigurable networks. Prior work showed that VLB achieves the optimal tradeoff between latency and guaranteed throughput. In this work we show that a strictly superior latency-throughput tradeoff is achievable when the throughput bound is relaxed to hold with high probability. The same improved tradeoff is also achievable with guaranteed throughput under time-stationary demands, provided the latency bound is relaxed to hold with high probability and that the network is allowed to be semi-oblivious, using an oblivious (randomized) connection schedule but demand-aware routing. We prove that the latter result is not achievable by any fully-oblivious reconfigurable network design, marking a rare case in which semi-oblivious routing has a provable asymptotic advantage over oblivious routing. Our results are enabled by a novel oblivious routing scheme that improves VLB by stretching routing paths the minimum possible amount — an additive stretch of 1 rather than a multiplicative stretch of 2 — yet still manages to balance load with high probability when either the traffic demand matrix or the network’s interconnection schedule are shuffled by a uniformly random permutation. To analyze our routing scheme we prove an exponential tail bound which may be of independent interest, concerning the distribution of values of a bilinear form on an orbit of a permutation group action.
more » « less
Full Text Available
Leo: Online ML-based Traffic Classification at Multi-Terabit Line Rate

Jafri, Syed Usman; Rao, Sanjay; Shrivastav, Vishal; Tawarmalani, Mohit (April 2024, USENIX Symposium on Networked Systems Design and Implementation (NSDI))

Online traffic classification enables critical applications such as network intrusion detection and prevention, providing Quality-of-Service, and real-time IoT analytics. However, with increasing network speeds, it has become extremely challenging to analyze and classify traffic online. In this paper, we present Leo, a system for online traffic classification at multi-terabit line rates. At its core, Leo implements an online machine learning (ML) model for traffic classification, namely the decision tree, in the network switch's data plane. Leo's design is fast (can classify packets at switch's line rate), scalable (can automatically select a resource-efficient design for the class of decision tree models a user wants to support), and runtime programmable (the model can be updated on-the-fly without switch downtime), while achieving high model accuracy. We implement Leo on top of Intel Tofino switches. Our evaluations show that Leo is able to classify traffic at line rate with nominal latency overhead, can scale to model sizes more than twice as large as state-of-the-art data plane ML classification systems, while achieving classification accuracy on-par with an offline traffic classifier.
more » « less
Full Text Available
Leo: online ML-based traffic classification at multi-terabit line rate

Jafri, Syed Usman; Rao, Sanjay; Shrivastav, Vishal; Tawarmalani, Mohit (April 2024, Usenix NSDI)

Full Text Available
Leo: online ML-based traffic classification at multi-terabit line rate

Jafri, Syed Usman; Rao, Sanjay G; Shrivastav, Vishal; Tawarmalani, Mohit (April 2024, Usenix NSDI)

Full Text Available
Leo: Online ML-based Traffic Classification at Multi-Terabit Line Rate

Jafri, Syed Usman; Rao, Sanjay G; Shrivastav, Vishal; and Tawarmalani, Mohit (April 2024, Usenix NSDI)

Full Text Available

« Prev Next »

Search for: All records