skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: NSDF-Services: Integrating Networking, Storage, and Computing Services into a Testbed for Democratization of Data Delivery
The lack of a readily accessible, tightly integrated data fabric connecting high-speed networking, storage, and computing services remains a critical barrier to the democratization of scientific discovery. To address this challenge, we are building National Science Data Fabric (NSDF), a holistic ecosystem to facilitate domain scientists in their daily research. NSDF comprises networking, storage, and computing services, as well as outreach initiatives. In this paper, we present a testbed integrating three services (i.e., networking, storage, and computing). We evaluate their performance. Specifically, we study the networking services and their throughput and latency with a focus on academic cloud providers; the storage services and their performance with a focus on data movement using file system mappers for both academic and commercial clouds; and computing orchestration services focusing on commercial cloud providers. We discuss NSDF's potential to increase scalability and usability as it decreases time-to-discovery across scientific domains.  more » « less
Award ID(s):
2138811
PAR ID:
10549335
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
ACM
Date Published:
ISBN:
9798400702341
Page Range / eLocation ID:
1 to 10
Format(s):
Medium: X
Location:
Taormina (Messina) Italy
Sponsoring Org:
National Science Foundation
More Like this
  1. Doglioni, C.; Kim, D.; Stewart, G.A.; Silvestris, L.; Jackson, P.; Kamleh, W. (Ed.)
    Commercial Cloud computing is becoming mainstream, with funding agencies moving beyond prototyping and starting to fund production campaigns, too. An important aspect of any scientific computing production campaign is data movement, both incoming and outgoing. And while the performance and cost of VMs is relatively well understood, the network performance and cost is not. This paper provides a characterization of networking in various regions of Amazon Web Services, Microsoft Azure and Google Cloud Platform, both between Cloud resources and major DTNs in the Pacific Research Platform, including OSG data federation caches in the network backbone, and inside the clouds themselves. The paper contains both a qualitative analysis of the results as well as latency and peak throughput measurements. It also includes an analysis of the costs involved with Cloud-based networking. 
    more » « less
  2. Reed, Daniel A.; Lifka, David; Swanson, David; Amaro, Rommie; Wilkins-Diehr, Nancy (Ed.)
    This report summarizes the discussions from a workshop convened at NSF on May 30-31, 2018 in Alexandria, VA. The overarching objective of the workshop was to rethink the nature and composition of the NSF-supported computational ecosystem given changing application requirements and resources and technology landscapes. The workshop included roughly 50 participants, drawn from high-performance computing (HPC) centers, campus computing facilities, cloud service providers (academic and commercial), and distributed resource providers. Participants spanned both large research institutions and smaller universities. Organized by Daniel Reed (University of Utah, chair), David Lifka (Cornell University), David Swanson (University of Nebraska), Rommie Amaro (UCSD), and Nancy Wilkins-Diehr (UCSD/SDSC), the workshop was motivated by the following observations. First, there have been dramatic changes in the number and nature of applications using NSF-funded resources, as well as their resource needs. As a result, there are new demands on the type (e.g., data centric) and location (e.g., close to the data or the users) of the resources as well as new usage modes (e.g., on-demand and elastic). Second, there have been dramatic changes in the landscape of technologies, resources, and delivery mechanisms, spanning large scientific instruments, ubiquitous sensors, and cloud services, among others. 
    more » « less
  3. Cloud virtualization and multi-tenant networking provide Infrastructure as a Service (IaaS) providers a new and innovative way to offer on-demand services to their customers, such as easy provisioning of new applications and better resource efficiency and scalability. However, existing data-intensive intelligent applications require more powerful processors, higher bandwidth and lower-latency networking service. In order to boost the performance of computing and networking services, as well as reduce the overhead of software virtualization, we propose a new data center network design based on OpenStack. Specifically, we map the OpenStack networking services to the hardware switch and utilize hardware-accelerated L2 switch and L3 routing to solve the software limitations, as well as achieve software-like scalability and flexibility. We design our prototype system via the Arista Software-Defined-Networking (SDN) switch and provide an automatic script which abstracts the service layer that decouples OpenStack from the physical network infrastructure, thereby providing vendor-independence. We have evaluated the performance improvement in terms of bandwidth, delay, and system resource utilization using various tools and under various Quality-of-Service (QoS) constraints. Our solution demonstrates improved cloud scaling and network efficiency via only one touch point to control all vendors' devices in the data center. 
    more » « less
  4. A key dimension of reproducibility in testbeds is stable performance that scales in regular and predictable ways in accordance with declarative specifications for virtual resources. We contend that reproducibility is crucial for elastic performance control in live experiments, in which testbed tenants (slices) provide services for real user traffic that varies over time. This paper gives an overview of ExoPlex, a framework for deploying network service providers (NSPs) as a basis for live inter-domain networking experiments on the ExoGENI testbed. As a motivating example, we show how to use ExoPlex to implement a virtual software-defined exchange (vSDX) as a tenant NSP. The vSDX implements security-managed interconnection of customer IP networks that peer with it via direct L2 links stitched dynamically into its slice. An elastic controller outside of the vSDX slice provisions network links and computing capacity for a scalable monitoring fabric within the tenant vSDX slice. The vSDX checks compliance of traffic flows with customer-specified interconnection policies, and blocks traffic from senders that trigger configured rules for intrusion detection in Bro security monitors. We present initial results showing the effect of resource provisioning on Bro performance within the vSDX. 
    more » « less
  5. The landscape of research in science and engineering is heavily reliant on computation and data processing. There is continued and expanded usage by disciplines that have historically used advanced computing resources, new usage by disciplines that have not traditionally used HPC, and new modalities of the usage in Data Science, Machine Learning, and other areas of AI. Along with these new patterns have come new advanced computing resource methods and approaches, including the availability of commercial cloud resources. The Coalition for Academic Scientific Computation (CASC) has long been an advocate representing the needs of academic researchers using computational resources, sharing best practices and offering advice to create a national cyberinfrastructure to meet US science, engineering, and other academic computing needs. CASC has completed the first of what we intend to be an annual survey of academic cloud and data center usage and practices in analyzing return on investment in cyberinfrastructure. Critically important findings from this first survey include the following: many of the respondents are engaged in some form of analysis of return in research computing investments, but only a minority currently report the results of such analyses to their upper-level administration. Most respondents are experimenting with use of commercial cloud resources but no respondent indicated that they have found use of commercial cloud services to create financial benefits compared to their current methods. There is clear correlation between levels of investment in research cyberinfrastructure and the scale of both cpu core-hours delivered and the financial level of supported research grants. Also interesting is that almost every respondent indicated that they participate in some sort of national cooperative or nationally provided research computing infrastructure project and most were involved in academic computing-related organizations, indicating a high degree of engagement by institutions of higher education in building and maintaining national research computing ecosystems. Institutions continue to evaluate cloud-based HPC service models, despite having generally concluded that so far cloud HPC is too expensive to use compared to their current methods. 
    more » « less