skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Scientific Community Transfer Protocols, Tools, and Their Performance Based on Network Capabilities
The efficiency of high energy physics workflows relies on the ability to rapidly transfer data among the sites where the data is processed and analyzed. The best data transfer tools should provide a simple and reliable solution for local, regional, national and in some cases intercontinental data transfers. This work outlines the results of data transfer tool tests using internal and external (simulated latency and packet loss) in 100 Gbps testbeds and compares the results among the existing solutions, while also treating the issue of tuning parameters and methods to help optimize the rates of transfers. Many tools have been developed to facilitate data transfers over wide area networks. However, few studies have shown the tools’ requirements, use cases, and reliability through comparative measurements. Here, we were evaluating a variety of high-performance data transfer tools used today in the LHC and other scientific communities, such as FDT, WDT, and NDN in different environments. Furthermore, this test was made to reproduce real-world data transfer examples to analyse each tool’s strengths and weaknesses, including the fault tolerance of the tools when we have packet loss. By comparing the tools in a controlled environment, we can shed light on the tool’s relative reliability and usability for academia and industry. Also, this work highlights the best tuning parameters for WAN and LAN transfers for maximum performance, in several cases.  more » « less
Award ID(s):
2019012
PAR ID:
10548852
Author(s) / Creator(s):
; ; ; ;
Editor(s):
De_Vita, R; Espinal, X; Laycock, P; Shadura, O
Publisher / Repository:
EPJ Web of Conferences
Date Published:
Journal Name:
EPJ Web of Conferences
Volume:
295
ISSN:
2100-014X
Page Range / eLocation ID:
04036
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Application-layer transfer configurations play a crucial role in achieving desirable performance in high-speed networks. However, finding the optimal configuration for a given transfer task is a difficult problem as it depends on various factors including dataset characteristics, network settings, and background traffic. The state-of-the-art transfer tuning solutions rely on real-time sample transfers to evaluate various configurations and estimate the optimal one. However, existing approaches to run sample transfers incur high delay and measurement errors, thus significantly limit the efficiency of the transfer tuning algorithms. In this paper, we introduce adaptive feed forward deep neural network (DNN) to minimize the error rate of sample transfers without increasing their execution time. We ran 115K file transfers in four different high-speed networks and used their logs to train an adaptive DNN that can quickly and accurately predict the throughput of sample transfers by analyzing instantaneous throughput values. The results gathered in various networks with rich set of transfer configurations indicate that the proposed model reduces error rate by up to 50% compared to the state-of-the-art solutions while keeping the execution time low. We also show that one can further reduce delay or error rate by tuning hyperparameters of the model to meet specific needs of user or application. Finally, transfer learning analysis reveals that the model developed in one network would yield accurate results in other networks with similar transfer convergence characteristics, alleviating the needs to run an extensive data collection and model derivation efforts for each network. 
    more » « less
  2. Modern scientific workflows are data-driven and are often executed on distributed, heterogeneous, high-performance computing infrastructures. Anomalies and failures in the work- flow execution cause loss of scientific productivity and inefficient use of the infrastructure. Hence, detecting, diagnosing, and mitigating these anomalies are immensely important for reliable and performant scientific workflows. Since these workflows rely heavily on high-performance network transfers that require strict QoS constraints, accurately detecting anomalous network perfor- mance is crucial to ensure reliable and efficient workflow execu- tion. To address this challenge, we have developed X-FLASH, a network anomaly detection tool for faulty TCP workflow transfers. X-FLASH incorporates novel hyperparameter tuning and data mining approaches for improving the performance of the machine learning algorithms to accurately classify the anoma- lous TCP packets. X-FLASH leverages XGBoost as an ensemble model and couples XGBoost with a sequential optimizer, FLASH, borrowed from search-based Software Engineering to learn the optimal model parameters. X-FLASH found configurations that outperformed the existing approach up to 28%, 29%, and 40% relatively for F-measure, G-score, and recall in less than 30 evaluations. From (1) large improvement and (2) simple tuning, we recommend future research to have additional tuning study as a new standard, at least in the area of scientific workflow anomaly detection. 
    more » « less
  3. Scientific data volume is growing, and the need for faster transfers is increasing. The community has used parallel transfer methods with multi-threaded and multi-source downloads to reduce transfer times. In multi-source transfers, a client downloads data from several replicated servers in parallel. Tools such as Aria2 and BitTorrent support this approach and show improved performance. This work introduces the Multi-Source Data Transfer Protocol, MDTP, which improves multi-source transfer performance further. MDTP divides a file request into smaller chunk requests and assigns the chunks across multiple servers. The system adapts chunk sizes based on each server’s performance and selects them so each round of requests finishes at roughly the same time. The chunk-size allocation problem is formulated as a variant of bin packing, where adaptive chunking fills the capacity “bins’’ of each server efficiently. Evaluation shows that MDTP reduces transfer time by 10–22% compared to Aria2. Comparisons with static chunking and BitTorrent show even larger gains. MDTP also distributes load proportionally across all replicas instead of relying only on the fastest one, which increases throughput. MDTP maintains high throughput even when latency increases or bandwidth to the fastest server drops. 
    more » « less
  4. Science and engineering applications are now generating data at an unprecedented rate. From large facilities such as the Large Hadron Collider to portable DNA sequencing devices, these instruments can produce hundreds of terabytes in short periods of time. Researchers and other professionals rely on networks to transfer data between sensing locations, instruments, data storage devices, and computing systems. While general-purpose networks, also referred to as enterprise networks, are capable of transporting basic data, such as e-mails and Web content, they face numerous challenges when transferring terabyte- and petabyte-scale data. At best, transfers of science data on these networks may last days or even weeks. In response to this challenge, the Science Demilitarized Zone (Science DMZ) has been proposed. The Science DMZ is a network or a portion of a network designed to facilitate the transfer of big science data. The main elements of the Science DMZ include: 1) specialized end devices, referred to as data transfer nodes (DTNs), built for sending/receiving data at a high speed over wide area networks; 2) high-throughput, friction-free paths connecting DTNs, instruments, storage devices, and computing systems; 3) performance measurement devices to monitor end-to-end paths over multiple domains; and 4) security policies and enforcement mechanisms tailored for high-performance environments. Despite the increasingly important role of Science DMZs, the literature is still missing a guideline to provide researchers and other professionals with the knowledge to broaden the understanding and development of Science DMZs. This paper addresses this gap by presenting a comprehensive tutorial on Science DMZs. The tutorial reviews fundamental network concepts that have a large impact on Science DMZs, such as router architecture, TCP attributes, and operational security. Then, the tutorial delves into protocols and devices at different layers, from the physical cyberinfrastructure to application-layer tools and security appliances, that must be carefully considered for the optimal operation of Science DMZs. This paper also contrasts Science DMZs with general-purpose networks, and presents empirical results and use cases applicable to current and future Science DMZs. 
    more » « less
  5. The increase and rapid growth of data produced by scientific instruments, the Internet of Things (IoT), and social media is causing data transfer performance and resource consumption to garner much attention in the research community. The network infrastructure and end systems that enable this extensive data movement use a substantial amount of electricity, measured in terawatt-hours per year. Managing energy consumption within the core networking infrastructure is an active research area, but there is a limited amount of work on reducing power consumption at the end systems during active data transfers. This paper presents a novel two-phase dynamic throughput and energy optimization model that utilizes an offline decision-search-tree based clustering technique to encapsulate and categorize historical data transfer log information and an online search optimization algorithm to find the best application and kernel layer parameter combination to maximize the achieved data transfer throughput while minimizing the energy consumption. Our model also incorporates an ensemble method to reduce aleatoric uncertainty in finding optimal application and kernel layer parameters during the offline analysis phase. The experimental evaluation results show that our decision-tree based model outperforms the state-of-the-art solutions in this area by achieving 117% higher throughput on average and also consuming 19% less energy at the end systems during active data transfers. 
    more » « less