skip to main content


Title: Characterizing network paths in and out of the clouds
Commercial Cloud computing is becoming mainstream, with funding agencies moving beyond prototyping and starting to fund production campaigns, too. An important aspect of any scientific computing production campaign is data movement, both incoming and outgoing. And while the performance and cost of VMs is relatively well understood, the network performance and cost is not. This paper provides a characterization of networking in various regions of Amazon Web Services, Microsoft Azure and Google Cloud Platform, both between Cloud resources and major DTNs in the Pacific Research Platform, including OSG data federation caches in the network backbone, and inside the clouds themselves. The paper contains both a qualitative analysis of the results as well as latency and peak throughput measurements. It also includes an analysis of the costs involved with Cloud-based networking.  more » « less
Award ID(s):
1841530 1826967
NSF-PAR ID:
10299943
Author(s) / Creator(s):
; ;
Editor(s):
Doglioni, C.; Kim, D.; Stewart, G.A.; Silvestris, L.; Jackson, P.; Kamleh, W.
Date Published:
Journal Name:
EPJ Web of Conferences
Volume:
245
ISSN:
2100-014X
Page Range / eLocation ID:
07059
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Advanced imaging and DNA sequencing technologies now enable the diverse biology community to routinely generate and analyze terabytes of high resolution biological data. The community is rapidly heading toward the petascale in single investigator laboratory settings. As evidence, the single NCBI SRA central DNA sequence repository contains over 45 petabytes of biological data. Given the geometric growth of this and other genomics repositories, an exabyte of mineable biological data is imminent. The challenges of effectively utilizing these datasets are enormous as they are not only large in the size but also stored in geographically distributed repositories in various repositories such as National Center for Biotechnology Information (NCBI), DNA Data Bank of Japan (DDBJ), European Bioinformatics Institute (EBI), and NASA’s GeneLab. In this work, we first systematically point out the data-management challenges of the genomics community. We then introduce Named Data Networking (NDN), a novel but well-researched Internet architecture, is capable of solving these challenges at the network layer. NDN performs all operations such as forwarding requests to data sources, content discovery, access, and retrieval using content names (that are similar to traditional filenames or filepaths) and eliminates the need for a location layer (the IP address) for data management. Utilizing NDN for genomics workflows simplifies data discovery, speeds up data retrieval using in-network caching of popular datasets, and allows the community to create infrastructure that supports operations such as creating federation of content repositories, retrieval from multiple sources, remote data subsetting, and others. Named based operations also streamlines deployment and integration of workflows with various cloud platforms. Our contributions in this work are as follows 1) we enumerate the cyberinfrastructure challenges of the genomics community that NDN can alleviate, and 2) we describe our efforts in applying NDN for a contemporary genomics workflow (GEMmaker) and quantify the improvements. The preliminary evaluation shows a sixfold speed up in data insertion into the workflow. 3) As a pilot, we have used an NDN naming scheme (agreed upon by the community and discussed in Section 4 ) to publish data from broadly used data repositories including the NCBI SRA. We have loaded the NDN testbed with these pre-processed genomes that can be accessed over NDN and used by anyone interested in those datasets. Finally, we discuss our continued effort in integrating NDN with cloud computing platforms, such as the Pacific Research Platform (PRP). The reader should note that the goal of this paper is to introduce NDN to the genomics community and discuss NDN’s properties that can benefit the genomics community. We do not present an extensive performance evaluation of NDN—we are working on extending and evaluating our pilot deployment and will present systematic results in a future work. 
    more » « less
  2. null (Ed.)
    Scientific computing needs are growing dramatically with time and are expanding in science domains that were previously not compute intensive. When compute workflows spike well in excess of the capacity of their local compute resource, capacity should be temporarily provisioned from somewhere else to both meet deadlines and to increase scientific output. Public Clouds have become an attractive option due to their ability to be provisioned with minimal advance notice. The available capacity of cost-effective instances is not well understood. This paper presents expanding the IceCube's production HTCondor pool using cost-effective GPU instances in preemptible mode gathered from the three major Cloud providers, namely Amazon Web Services, Microsoft Azure and the Google Cloud Platform. Using this setup, we sustained for a whole workday about 15k GPUs, corresponding to around 170 PFLOP32s, integrating over one EFLOP32 hour worth of science output for a price tag of about $60k. In this paper, we provide the reasoning behind Cloud instance selection, a description of the setup and an analysis of the provisioned resources, as well as a short description of the actual science output of the exercise. 
    more » « less
  3. The Cloud-Enhanced Open Software Defined Mobile Wireless Testbed for City-Scale Deployment (COSMOS) platform is a programmable city-scale shared multi-user advanced wireless testbed that is being deployed in West Harlem of New York City [1]. To keep pace with the significantly increased wireless link bandwidth and to effectively integrate the emerging C-RANs, COSMOS is designed to incorporate a fast programmable core network for providing connections across different computing layers. A key feature of COSMOS is its dark fiber based optical x-haul network that enables both highly flexible, user defined network topologies and experimentation directly in the optical physical layer. The optical architecture of COSMOS was presented in [2]. In this abstract, we present the tools and services designed to configure and monitor the performance of optical paths and topologies of the COSMOS testbed. In particular, we present the SDN framework that allows testbed users to implement experiments with application-driven control of optical and data networking functionalities. 
    more » « less
  4. Abstract—The Cloud-Enhanced Open Software Defined Mobile Wireless Testbed for City-Scale Deployment (COSMOS) platform is a programmable city-scale shared multi-user advanced wireless testbed that is being deployed in West Harlem of New York City [1]. To keep pace with the significantly increased wireless link bandwidth and to effectively integrate the emerging C-RANs, COSMOS is designed to incorporate a fast programmable core network for providing connections across different computing layers. A key feature of COSMOS is its dark fiber based optical x-haul network that enables both highly flexible, user defined network topologies and experimentation directly in the optical physical layer. The optical architecture of COSMOS was presented in [2]. In this abstract, we present the tools and services designed to configure and monitor the performance of optical paths and topologies of the COSMOS testbed. In particular, we present the SDN framework that allows testbed users to implement experiments with application-driven control of optical and data networking functionalities. 
    more » « less
  5. Cloud visualization and multi-tenant networking provide Infrastructure as a Service (IaaS) provider a new and innovative way to offer the on-demand services to their customers, such as the easy provisioning of new applications and the better resource efficiency and scalability. However, existing data-intensive applications require more powerful processor and computing power, as well as a high bandwidth, low latency and consistent networking service. In order to boost the performance of computing and networking services, as well as reduce the overhead of the software virtualization, we propose a new data center network design based on OpenStack, which is a promising cloud operating system solution. Specifically, we map the OpenStack networking services to the hardware switch, and perform hardware-accelerated L2 switch and L3 routing to solve the software limitations, as well as achieve the software-like scalability and flexibility. We designed our prototype system via the Arista Software-Defined-Networking (SDN) switch, and evaluated the performance improvement in terms of the bandwidth and delay using various tools. Our experimental results demonstrate that our datacenter networking solution achieves higher bandwidth, lower latency, and lower CPU utilization of the host server. 
    more » « less