This paper presents the motivation and design of MTP, a new offload-friendly message transport protocol. Existing transport protocols like TCP, MPTCP, and UDP/Quic all have key limitations when used in a network that may potentially offload computation from end-servers into NICs, switches, and other network devices. To enable important new in-network computing use cases and correct congestion control in the face of ever changing network paths and application replicas, MTP introduces a new message transport protocol design and pathlet congestion control, a new approach where end-hosts explicitly communicate messaging information to network devices and network devices explicitly communicate network path and congestion information back to end-hosts.
more »
« less
Shape-shifting Elephants: Multi-modal Transport for Integrated Research Infrastructure
Data Acquisition (DAQ) workloads form an important class of scientific network traffic that by its nature (1) flows across different research infrastructure, including remote instruments and supercomputer clusters, (2) has ever-increasing through-put demands, and (3) has ever-increasing integration demands—for example, observations at one instrument could trigger a reconfiguration of another instrument. Today’s DAQ transfers rely on UDP and (heavily tuned) TCP, but this is driven by convenience rather than suitability. The mismatch between Internet transport protocols and scientific workloads becomes more stark with the steady increase in link capacities, data generation, and integration across research infrastructure. This position paper argues the importance of developing specialized transport protocols for DAQ workloads. It proposes a new transport feature for this kind of elephant flow: multi-modality involves the network actively configuring the transport protocol to change how DAQ flows are processed across different underlying networks that connect scientific research infrastructure. Multi-modality is a layering violation that is proposed as a pragmatic technique for DAQ transport protocol design. It takes advantage of programmable network hardware that is increasingly being deployed in scientific research infrastructure. The paper presents an initial evaluation through a pilot study that includes a Tofino2 switch and Alveo FPGA cards, and using data from a particle detector.
more »
« less
- Award ID(s):
- 2346499
- PAR ID:
- 10601200
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9798400712722
- Page Range / eLocation ID:
- 308 to 317
- Format(s):
- Medium: X
- Location:
- Irvine CA USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
In the past decade, GPUs have become an important resource for compute-intensive, general-purpose GPU applications such as machine learning, big data analysis, and large-scale simulations. In the future, with the explosion of machine learning and big data, application demands will keep increasing, resulting in more data and computation being pushed to GPUs. However, due to the slowing of Moore’s Law and rising manufacturing costs, it is becoming more and more challenging to add compute resources into a single GPU device to improve its throughput. As a result, spreading work across multiple GPUs is popular in data-centric and scientific applications. For example, Facebook uses 8 GPUs per server in their recent machine learning platform. However, research infrastructure has not kept pace with this trend: most GPU hardware simulators, including gem5, only support a single GPU. Thus, it is hard to study interference between GPUs, communication between GPUs, or work scheduling across GPUs. Our research group has been working to address this shortcoming by adding multi-GPU support to gem5. Here, we discuss the changes that were needed, which included updating the emulated driver, GPU components, and coherence protocol.more » « less
-
Science is being conducted in an era of information abundance. The rate at which science data is generated is increasing, both in volume and variety. This phenomenon is transforming how science is thought of and practiced. This transformation is being shaped by new scientific instruments that are being designed and deployed that will dramatically increase the need for large, real-time data transfers among scientists throughout the world. One such instrument is the Square Kilometer Array (SKA) being built in South Africa that will transmit approximately 160Gbps of data from each radio dish to a central processor. This paper describes a collaborative effort to respond to the demands of big data scientific instruments through the development of an international software defined exchange point (SDX) that will meet the network provisioning needs for science applications. This paper discusses the challenges of end-to-end path provisioning across multiple research and education networks using OpenFlow/SDN technologies. Furthermore, it refers to the AtlanticWave-SDX, a project at Florida International University and the Georgia Institute of Technology, funded by the US National Science Foundation (NSF), along with support from Brazil’s NREN, Rede Nacional de Ensino e Pesquisa (RNP, and the Academic Network of Sao Paulo (ANSP). Future work explores the feasibility of establishing an SDX in West Africa, in collaboration with regional African RENs, based on the planned availability of submarine cable spectrum for use by research and education communities.more » « less
-
Distributed experimental networks have emerged as a powerful approach in field ecology, enabling experimental replication across global gradients. These networks use standardized treatments at dispersed sites to identify factors like climate or soil that shape biotic responses. Reserving space for future “add‐on” work fosters discovery by transforming distributed networks into distributed experimental infrastructure. However, challenges include balancing feasibility, plot impacts, and demands on site scientists. Using the Disturbance and Recovery Across Grasslands Network (DRAGNet) as a case study informed by lessons learned in the Nutrient Network (NutNet), we outline effective practices for designing add‐on work to retain the original experiment’s integrity while effectively using the resources of the network participants. By following guidelines for hypothesis‐driven, inclusive research that engages contributors intellectually, minimizes plot impacts using field‐tested protocols, and maximizes scientific impact and inclusion, distributed networks can become valuable infrastructure for advancing ecological understanding.more » « less
-
The increasing complexity of AI workloads, especially distributed Large Language Model (LLM) training, places significant strain on the networking infrastructure of parallel data centers and supercomputing systems. While Equal-Cost Multi-Path (ECMP) routing distributes traffic over parallel paths, hash collisions often lead to imbalanced network resource utilization and performance bottlenecks. This paper presents FlowTracer, a tool designed to analyze network path utilization and evaluate different routing strategies. Unlike tools that introduce additional traffic, FlowTracer aids in debugging network inefficiencies by passively monitoring and correlating user workload flows. As a result, FlowTracer does not interfere with ongoing data transfers, enabling analysis with minimal overhead, which is an important factor when debugging and fine-tuning routing schemes in production systems. FlowTracer can provide detailed insights into traffic distribution and can help identify the root causes of performance degradation, such as hash collisions. With FlowTracer’s flow-level insights, system operators can optimize routing, reduce congestion, and improve the performance of distributed AI workloads. We use a RoCEv2-enabled cluster with a leaf-spine network and 16 400-Gbps nodes to demonstrate how FlowTracer can be used to compare the flow imbalances of ECMP routing against a statically configured network. The example showcases a 30% reduction in imbalance, as measured by a new metric we introduce.more » « less
An official website of the United States government
