skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: International Networking in support of Extremely Large Astronomical Data-centric Operations
New international academic collaborations are being created at a fast pace, generating data sets each day, in the order of terabytes in size. Often these data sets need to be moved in real-time to a central location to be processed and then shared. In the field of astronomy, building data processing facilities in remote locations is not always feasible, creating the need for a high bandwidth network infrastructure to transport these data sets very long distances. This network infrastructure normally relies on multiple networks operated by multiple organizations or projects. Creating an end-to-end path involving multiple network operators, technologies and interconnections often adds conditions that make the real-time movement of big data sets challenging. The Large Synoptic Survey Telescope (LSST) is an example of astronomical applications imposing new challenges on multi-domain network provisioning activities. The network for LSST is challenging for a number of reasons: (1) with the telescope in Chile and the archiving facility in the USA, the network has a high propagation delay, which affects traditional transport protocols performance; (2) the path is composed of multiple network operators, which means that the different network operating teams involved must coordinate technologies and protocols to support all parallel data transfers in an efficient way; (3) the large amount of data produced (12.7GB/image) and the small interval available to transfer this data (5 seconds) to the archiving facility requires special Quality of Service (QoS) policies; (4) because network events happen, the network needs to be prepared to be adjusted for rainy days, where some data types will be prioritized over others. To guarantee data transfers will happen within the required interval, each network operator in the path needs to apply QoS policies to each of its network links. These policies need to be coordinated end-to-end and, in the case where the network is affected by parallel events, all policies might need to be dynamically reconfigured in real-time to accommodate specific QoS policies for rainy days. Reconfiguring QoS policies is a very complex activity to current network protocols and technologies, sometimes requiring human intervention. This presentation aims to share the efforts to guarantee an efficient network configuration capable of handling LSST data transfers in sunny and rainy days across multiple network operators from South to North America.  more » « less
Award ID(s):
1451024
PAR ID:
10056971
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ;
Date Published:
Journal Name:
Astronomical Data Analysis Software and Systems (ADASS XXVII) conference
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Modern data center storage systems are invariably networked to allow for consolidation and flexible management of storage. They also include high-performance storage devices based on flash or other emerging technologies, generally accessed through low-latency and high-throughput protocols such as Non-volatile Memory Express (NVMe) (or its derivatives) carried over the network. With the increasing complexity and data-centric nature of the applications, properly configuring the quality of service (QoS) for the storage path has become crucial for ensuring the desired application performance. Such QoS is substantially influenced by the QoS in the network path, in the access protocol, and in the storage device. In this article, we define a new transport-level QoS mechanism for the network segment and demonstrate how it can augment and coordinate with the access-level QoS mechanism defined for NVMe, and a similar QoS mechanism configured in the device. We show that the transport QoS mechanism not only provides the desired QoS to different classes of storage accesses but is also able to protect the access to the shared persistent memory devices located along with the storage but requiring much lower latency than storage. We demonstrate that a proper coordinated configuration of the three QoS’s on the path is crucial to achieve the desired differentiation, depending on where the bottlenecks appear. 
    more » « less
  2. De_Vita, R; Espinal, X; Laycock, P; Shadura, O (Ed.)
    The Large Hadron Collider (LHC) experiments distribute data by leveraging a diverse array of National Research and Education Networks (NRENs), where experiment data management systems treat networks as a “blackbox” resource. After the High Luminosity upgrade, the Compact Muon Solenoid (CMS) experiment alone will produce roughly 0.5 exabytes of data per year. NREN Networks are a critical part of the success of CMS and other LHC experiments. However, during data movement, NRENs are unaware of data priorities, importance, or need for quality of service, and this poses a challenge for operators to coordinate the movement of data and have predictable data flows across multi-domain networks. The overarching goal of SENSE (The Software-defined network for End-to-end Networked Science at Exascale) is to enable National Labs and universities to request and provision end-to-end intelligent network services for their application workflows leveraging SDN (Software-Defined Networking) capabilities. This work aims to allow LHC Experiments and Rucio, the data management software used by CMS Experiment, to allocate and prioritize certain data transfers over the wide area network. In this paper, we will present the current progress of the integration of SENSE, Multi-domain end-to-end SDN Orchestration with QoS (Quality of Service) capabilities, with Rucio, the data management software used by CMS Experiment. 
    more » « less
  3. In this paper we propose a novel approach to deliver better delay-jitter performance in dynamic networks. Dynamic networks experience rapid and unpredictable fluctuations and hence, a certain amount of uncertainty about the delay-performance of various network elements is unavoidable. This uncertainty makes it difficult for network operators to guarantee a certain quality of service (in terms of delay and jitter) to users. The uncertainty about the state of the network is often overlooked to simplify problem formulation, but we capture it by modeling the delay on various links as general and potentially correlated random processes. Within this framework, a user will request a certain delay-jitter performance guarantee from the network. After verifying the feasibility of the request, the network will respond to the user by specifying a set of routes as well as the proportion of traffic which should be sent through each one to achieve the desired QoS. We propose to use mean-variance analysis as the basis for traffic distribution and route selection, and show that this technique can significantly reduce the end-to-end jitter because it accounts for the correlated nature of delay across different paths. The resulting traffic distribution is often non-uniform and the fractional flow on each path is the solution to a simple convex optimization problem. We conclude the paper by commenting on the potential application of this method to general transportation networks. 
    more » « less
  4. Science and engineering applications are now generating data at an unprecedented rate. From large facilities such as the Large Hadron Collider to portable DNA sequencing devices, these instruments can produce hundreds of terabytes in short periods of time. Researchers and other professionals rely on networks to transfer data between sensing locations, instruments, data storage devices, and computing systems. While general-purpose networks, also referred to as enterprise networks, are capable of transporting basic data, such as e-mails and Web content, they face numerous challenges when transferring terabyte- and petabyte-scale data. At best, transfers of science data on these networks may last days or even weeks. In response to this challenge, the Science Demilitarized Zone (Science DMZ) has been proposed. The Science DMZ is a network or a portion of a network designed to facilitate the transfer of big science data. The main elements of the Science DMZ include: 1) specialized end devices, referred to as data transfer nodes (DTNs), built for sending/receiving data at a high speed over wide area networks; 2) high-throughput, friction-free paths connecting DTNs, instruments, storage devices, and computing systems; 3) performance measurement devices to monitor end-to-end paths over multiple domains; and 4) security policies and enforcement mechanisms tailored for high-performance environments. Despite the increasingly important role of Science DMZs, the literature is still missing a guideline to provide researchers and other professionals with the knowledge to broaden the understanding and development of Science DMZs. This paper addresses this gap by presenting a comprehensive tutorial on Science DMZs. The tutorial reviews fundamental network concepts that have a large impact on Science DMZs, such as router architecture, TCP attributes, and operational security. Then, the tutorial delves into protocols and devices at different layers, from the physical cyberinfrastructure to application-layer tools and security appliances, that must be carefully considered for the optimal operation of Science DMZs. This paper also contrasts Science DMZs with general-purpose networks, and presents empirical results and use cases applicable to current and future Science DMZs. 
    more » « less
  5. Scale-out datacenter network fabrics enable network operators to translate improved link and switch speeds directly into end-host throughput. Unfortunately, limits in the underlying CMOS packet switch chip manufacturing roadmap mean that NICs, links, and switches are not getting faster fast enough to meet demand. As a result, operators have introduced alternative, parallel fabric designs in the core of the network that deliver N-times the bandwidth by simply forwarding traffic over any of N parallel network fabrics. In this work, we consider extending this parallel network idea all the way to the end host. Our initial impressions found that direct application of existing path selection and forwarding techniques resulted in poor performance. Instead, we show that appropriate path selection and forwarding protocols can not only improve the performance of existing, homogeneous parallel fabrics, but enable the development of heterogeneous parallel network fabrics that can deliver even higher bandwidth, lower latency, and improved resiliency than traditional designs constructed from the same constituent components. 
    more » « less