User-associated contents play an increasingly important role in modern network applications. With growing deployments of edge servers, the capacity of content storage in edge clusters significantly increases, which provides great potential to satisfy content requests with much shorter latency. However, the large number of contents also causes the difficulty of searching contents on edge servers in different locations because indexing contents costs huge DRAM on each edge server. In this work, we explore the opportunity of efficiently indexing user-associated contents and propose a scalable content-sharing mechanism for edge servers, called EdgeCut, that significantly reduces content access latency by allowing many edge servers to share their cached contents. We design a compact and dynamic data structure called Ludo Locator that returns the IP address of the edge server that stores the requested user-associated content. We have implemented a prototype of EdgeCut in a real network environment running in a public geo-distributed cloud. The experiment results show that EdgeCut reduces content access latency by up to 50% and reduces cloud traffic by up to 50% compared to existing solutions. The memory cost is less than 50MB for 10 million mobile users. The simulations using real network latency data show EdgeCut’s advantages over existing solutions on a large scale.
more »
« less
Retina: Analyzing 100 GbE Traffic on Commodity Hardware
As network speeds have increased to over 100 Gbps, operators and researchers have lost the ability to easily ask complex questions of reassembled and parsed network traffic. In this paper, we introduce Retina, a software framework that lets users analyze over 100 Gbps of real-world traffic on a single server with no specialized hardware. Retina supports running arbitrary user-defined analysis functions on a wide variety of extensible data representations ranging from raw packets to parsed application-layer handshakes. We introduce a novel filtering mechanism and subscription interface to safely and efficiently process high-speed traffic. Under the hood, Retina implements an efficient data pipeline that strategically discards unneeded traffic and defers expensive processing operations to pre- serve computation for complex analyses. We present the framework architecture, evaluate its performance on production traffic, and explore several applications. Our experiments show that Retina is capable of running sophisticated analyses at over 100 Gbps on a single commodity server and can support 5–100x higher traffic rates than existing solutions, dramatically reducing the effort to complete investigations on real-world networks.
more »
« less
- PAR ID:
- 10346556
- Date Published:
- Journal Name:
- ACM SIGCOMM 2022 Conference (SIGCOMM ’22)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper presents a data-driven approach for predicting the propagation of traffic congestion at road segments as a function of the congestion in their neighboring segments. In the past, this problem has mostly been addressed by modelling the traffic congestion over some standard physical phenomenon through which it is difficult to capture all the modalities of such a dynamic and complex system. While other recent works have focused on applying a generalized data-driven technique on the whole network at once, they often ignore intersection characteristics. On the contrary, we propose a city-wide ensemble of intersection level connected LSTM models and propose mechanisms for identifying congestion events using the predictions from the networks. To reduce the search space of likely congestion sinks we use the likelihood of congestion propagation in neighboring road segments of a congestion source that we learn from the past historical data. We validated our congestion forecasting framework on the real world traffic data of Nashville, USA and identified the onset of congestion in each of the neighboring segments of any congestion source with an average precision of 0.9269 and an average recall of 0.9118 tested over ten congestion events.more » « less
-
The increasing complexity of AI workloads, especially distributed Large Language Model (LLM) training, places significant strain on the networking infrastructure of parallel data centers and supercomputing systems. While Equal-Cost Multi-Path (ECMP) routing distributes traffic over parallel paths, hash collisions often lead to imbalanced network resource utilization and performance bottlenecks. This paper presents FlowTracer, a tool designed to analyze network path utilization and evaluate different routing strategies. Unlike tools that introduce additional traffic, FlowTracer aids in debugging network inefficiencies by passively monitoring and correlating user workload flows. As a result, FlowTracer does not interfere with ongoing data transfers, enabling analysis with minimal overhead, which is an important factor when debugging and fine-tuning routing schemes in production systems. FlowTracer can provide detailed insights into traffic distribution and can help identify the root causes of performance degradation, such as hash collisions. With FlowTracer’s flow-level insights, system operators can optimize routing, reduce congestion, and improve the performance of distributed AI workloads. We use a RoCEv2-enabled cluster with a leaf-spine network and 16 400-Gbps nodes to demonstrate how FlowTracer can be used to compare the flow imbalances of ECMP routing against a statically configured network. The example showcases a 30% reduction in imbalance, as measured by a new metric we introduce.more » « less
-
The growing interest in autonomous driving calls for realistic simulation platforms capable of accurately simulating cooperative perception process in realistic traffic scenarios. Existing studies for cooperative perception often have not accounted for transmission latency and errors in real-world environments. To address this gap, we introduce EI-Drive (Edge Intelligent Drive), an Edge-AI based autonomous driving simulation platform that integrates advanced cooperative perception with more realistic communication models. Built on the CARLA framework, EI-Drive features new modules for cooperative perception while taking into account transmission latency and errors, providing a more realistic platform for evaluating cooperative perception algorithms. In particular, the platform enables vehicles to fuse data from multiple sources, improving situational awareness and safety in complex environments. With its modular design, EI-Drive allows for detailed exploration of sensing, perception, planning, and control in various cooperative driving scenarios. Experiments using EI-Drive demonstrate significant improvements in vehicle safety and performance, particularly in scenarios with complex traffic flow and network conditions. All code and documents are accessible on our GitHub page: \url{https://ucd-dare.github.io/eidrive.github.io/}.more » « less
-
With the ever-increasing size of training models and datasets, network communication has emerged as a major bottleneck in distributed deep learning training. To address this challenge, we propose an optical distributed deep learning (ODDL) architecture. ODDL utilizes a fast yet scalable all-optical network architecture to accelerate distributed training. One of the key features of the architecture is its flow-based transmit scheduling with fast reconfiguration. This allows ODDL to allocate dedicated optical paths for each traffic stream dynamically, resulting in low network latency and high network utilization. Additionally, ODDL provides physically isolated and tailored network resources for training tasks by reconfiguring the optical switch using LCoS-WSS technology. The ODDL topology also uses tunable transceivers to adapt to time-varying traffic patterns. To achieve accurate and fine-grained scheduling of optical circuits, we propose an efficient distributed control scheme that incurs minimal delay overhead. Our evaluation on real-world traces showcases ODDL’s remarkable performance. When implemented with 1024 nodes and 100 Gbps bandwidth, ODDL accelerates VGG19 training by 1.6× and 1.7× compared to conventional fat-tree electrical networks and photonic SiP-Ring architectures, respectively. We further build a four-node testbed, and our experiments show that ODDL can achieve comparable training time compared to that of anidealelectrical switching network.more » « less
An official website of the United States government

