The lack of a readily accessible, tightly integrated data fabric connecting high-speed networking, storage, and computing services remains a critical barrier to the democratization of scientific discovery. To address this challenge, we are building National Science Data Fabric (NSDF), a holistic ecosystem to facilitate domain scientists in their daily research. NSDF comprises networking, storage, and computing services, as well as outreach initiatives. In this paper, we present a testbed integrating three services (i.e., networking, storage, and computing). We evaluate their performance. Specifically, we study the networking services and their throughput and latency with a focus on academic cloud providers; the storage services and their performance with a focus on data movement using file system mappers for both academic and commercial clouds; and computing orchestration services focusing on commercial cloud providers. We discuss NSDF's potential to increase scalability and usability as it decreases time-to-discovery across scientific domains.
more »
« less
Characterizing network paths in and out of the clouds
Commercial Cloud computing is becoming mainstream, with funding agencies moving beyond prototyping and starting to fund production campaigns, too. An important aspect of any scientific computing production campaign is data movement, both incoming and outgoing. And while the performance and cost of VMs is relatively well understood, the network performance and cost is not. This paper provides a characterization of networking in various regions of Amazon Web Services, Microsoft Azure and Google Cloud Platform, both between Cloud resources and major DTNs in the Pacific Research Platform, including OSG data federation caches in the network backbone, and inside the clouds themselves. The paper contains both a qualitative analysis of the results as well as latency and peak throughput measurements. It also includes an analysis of the costs involved with Cloud-based networking.
more »
« less
- PAR ID:
- 10299943
- Editor(s):
- Doglioni, C.; Kim, D.; Stewart, G.A.; Silvestris, L.; Jackson, P.; Kamleh, W.
- Date Published:
- Journal Name:
- EPJ Web of Conferences
- Volume:
- 245
- ISSN:
- 2100-014X
- Page Range / eLocation ID:
- 07059
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This paper describes a cloud infrastructure and virtual laboratories on P4 programmable data plane switches. P4 programmable data planes emerged as a technology that enables innovation in networking. P4 is a programming language used to describe how network packets are processed. This paper explains an entry-level training library on P4. The virtual laboratories introduce the learner to P4 and data plane concepts by providing step-by-step guides and exercises. The virtual laboratories are hosted in the Academic Cloud, a distributed platform that manages and orchestrates computing resources. Additionally, the paper describes a work in progress of P4 virtual laboratories that uses Intel Tofino switches. Lastly, the paper discusses the use of the Academic Cloud as a network testbed.more » « less
-
In recent years, Field Programmable Gate Arrays (FPGAs) have gained prominence in cloud computing data centers, driven by their capacity to offload compute-intensive tasks and contribute to the ongoing trend of data center disaggregation, as well as their ability to be directly connected to the network. While FPGAs offer numerous advantages, they also pose challenges in terms of configuration, programmability, and monitoring, particularly in the absence of an operating system with essential features like the TCP/IP networking stack. This paper introduces an In-band Network Telemetry (INT) approach based on the P4 language for FPGA data plane programming. The goal is to facilitate monitoring and network performance analysis by providing one-way packet delay information. The approach is demonstrated in the Open Cloud Testbed (OCT) and FABRIC testbeds, both offering open access to the research community with greater FPGA availability than commercial clouds. The workflow enables researchers to create custom P4 programs and bitstreams for installation on FPGAs. The paper presents a multi-step approach allowing experimentation within the New England Research Cloud (NERC), testing in OCT, and final deployment in FABRIC, well-suited for one-way delay measurements due to synchronized clocks via GPS time signals. Contributions include the provision of a P4 workflow for FPGAs in a research cloud, a novel FPGA clock-based INT approach, and a comprehensive evaluation through simulation and experiments in the Open Cloud and FABRIC testbeds.more » « less
-
Cloud visualization and multi-tenant networking provide Infrastructure as a Service (IaaS) provider a new and innovative way to offer the on-demand services to their customers, such as the easy provisioning of new applications and the better resource efficiency and scalability. However, existing data-intensive applications require more powerful processor and computing power, as well as a high bandwidth, low latency and consistent networking service. In order to boost the performance of computing and networking services, as well as reduce the overhead of the software virtualization, we propose a new data center network design based on OpenStack, which is a promising cloud operating system solution. Specifically, we map the OpenStack networking services to the hardware switch, and perform hardware-accelerated L2 switch and L3 routing to solve the software limitations, as well as achieve the software-like scalability and flexibility. We designed our prototype system via the Arista Software-Defined-Networking (SDN) switch, and evaluated the performance improvement in terms of the bandwidth and delay using various tools. Our experimental results demonstrate that our datacenter networking solution achieves higher bandwidth, lower latency, and lower CPU utilization of the host server.more » « less
-
null (Ed.)Scientific computing needs are growing dramatically with time and are expanding in science domains that were previously not compute intensive. When compute workflows spike well in excess of the capacity of their local compute resource, capacity should be temporarily provisioned from somewhere else to both meet deadlines and to increase scientific output. Public Clouds have become an attractive option due to their ability to be provisioned with minimal advance notice. The available capacity of cost-effective instances is not well understood. This paper presents expanding the IceCube's production HTCondor pool using cost-effective GPU instances in preemptible mode gathered from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform. Using this setup, we sustained for a whole workday about 15k GPUs, corresponding to around 170 PFLOP32s, integrating over one EFLOP32 hour worth of science output for a price tag of about $60k. In this paper, we provide the reasoning behind Cloud instance selection, a description of the setup and an analysis of the provisioned resources, as well as a short description of the actual science output of the exercise.more » « less