Science DMZs are specialized networks that enable large-scale distributed scientific research, providing efficient and guaranteed performance while transferring large amounts of data at high rates. The high-speed performance of a Science DMZ is made viable via data transfer nodes (DTNs), therefore they are a critical point of failure. DTNs are usually monitored with network intrusion detection systems (NIDS). However, NIDS do not consider system performance data, such as network I/O interrupts and context switches, which can also be useful in revealing anomalous system performance potentially arising due to external network based attacks or insider attacks. In this paper, we demonstrate how system performance metrics can be applied towards securing a DTN in a Science DMZ network. Specifically, we evaluate the effectiveness of system performance data in detecting TCP-SYN flood attacks on a DTN using DBSCAN (a density-based clustering algorithm) for anomaly detection. Our results demonstrate that system interrupts and context switches can be used to successfully detect TCP-SYN floods, suggesting that system performance data could be effective in detecting a variety of attacks not easily detected through network monitoring alone.
A Comprehensive Tutorial on Science DMZ
Science and engineering applications are now generating data at an unprecedented rate. From large facilities such as the Large Hadron Collider to portable DNA sequencing devices, these instruments can produce hundreds of terabytes in short periods of time. Researchers and other professionals rely on networks to transfer data between sensing locations, instruments, data storage devices, and computing systems. While general-purpose networks, also referred to as enterprise networks, are capable of transporting basic data, such as e-mails and Web content, they face numerous challenges when transferring terabyte- and petabyte-scale data. At best, transfers of science data on these networks may last days or even weeks. In response to this challenge, the Science Demilitarized Zone (Science DMZ) has been proposed. The Science DMZ is a network or a portion of a network designed to facilitate the transfer of big science data. The main elements of the Science DMZ include: 1) specialized end devices, referred to as data transfer nodes (DTNs), built for sending/receiving data at a high speed over wide area networks; 2) high-throughput, friction-free paths connecting DTNs, instruments, storage devices, and computing systems; 3) performance measurement devices to monitor end-to-end paths over multiple domains; and 4) security policies and enforcement mechanisms tailored more »
- Award ID(s):
- Publication Date:
- NSF-PAR ID:
- Journal Name:
- IEEE Communications surveys and tutorials
- Sponsoring Org:
- National Science Foundation
More Like this
The science DMZ is a specialized network model developed to guarantee secure and efficient transfer of data for large-scale distributed research. To enable a high level of performance, the Science DMZ includes dedicated data transfer nodes (DTNs). Protecting these DTNs is crucial to maintaining the overall security of the network and the data, and insider attacks are a major threat. Although some limited network intrusion detection systems (NIDS) are deployed to monitor DTNs, this alone is not sufficient to detect insider threats. Monitoring for abnormal system behavior, such as unusual sequences of system calls, is one way to detect insider threats. However, the relatively predictable behavior of the DTN suggests that we can also detect unusual activity through monitoring system performance, such as CPU and disk usage, along with network activity. In this paper, we introduce a potential insider attack scenario, and show how readily available system performance metrics can be employed to detect data tampering within DTNs, using DBSCAN clustering to actively monitor for unexpected behavior.
Training and Teaching Students and IT Professionals on High-throughput Networking and Cybersecurity using a Private CloudThis paper describes the deployment of a private cloud and the development of virtual laboratories and companion material to teach and train engineering students and Information Technology (IT) professionals in high-throughput networks and cybersecurity. The material and platform, deployed at the University of South Carolina, are also used by other institutions to support regular academic courses, self-pace training of professional IT staff, and workshops across the country. The private cloud is used to deploy scenarios consisting of high-speed networks (up to 50 Gbps), multi-domain environments emulating internetworks, and infrastructures under cyber-attacks using live traffic. For regular academic courses, the virtual laboratories have been adopted by institutions in different states to supplement theoretical material with hands-on activities in IT, electrical engineering, and computer science programs. Topics include Local Area Networks (LANs), congestion-control algorithms, performance tools used to emulate wide area networks (WANs) and their attributes (packet loss, reordering, corruption, latency, jitter, etc.), data transfer applications for high-speed networks, queueing delay and buffer size in routers and switches, active monitoring of multi-domain systems, high-performance cybersecurity tools such as Zeek’s intrusion detection systems, and others. The training platform has been also used by IT professionals from more than 30 states, for self-pace training.more »
HPC networks and campus networks are beginning to leverage various levels of network programmability ranging from programmable network configuration (e.g., NETCONF/YANG, SNMP, OF-CONFIG) to software-based controllers (e.g., OpenFlow Controllers) to dynamic function placement via network function virtualization (NFV). While programmable networks offer new capabilities, they also make the network more difficult to debug. When applications experience unexpected network behavior, there is no established method to investigate the cause in a programmable network and many of the conventional troubleshooting debugging tools (e.g., ping and traceroute) can turn out to be completely useless. This absence of troubleshooting tools that support programmability is a serious challenge for researchers trying to understand the root cause of their networking problems. This paper explores the challenges of debugging an all-campus science DMZ network that leverages SDN-based network paths for high-performance flows. We propose Flow Tracer, a light-weight, data-plane-based debugging tool for SDN-enabled networks that allows end users to dynamically discover how the network is handling their packets. In particular, we focus on solving the problem of identifying an SDN path by using actual packets from the flow being analyzed as opposed to existing expensive approaches where either probe packets are injected into the network or actualmore »
Obeid, I. ; Selesnik, I. ; Picone, J. (Ed.)The Neuronix high-performance computing cluster allows us to conduct extensive machine learning experiments on big data . This heterogeneous cluster uses innovative scheduling technology, Slurm , that manages a network of CPUs and graphics processing units (GPUs). The GPU farm consists of a variety of processors ranging from low-end consumer grade devices such as the Nvidia GTX 970 to higher-end devices such as the GeForce RTX 2080. These GPUs are essential to our research since they allow extremely compute-intensive deep learning tasks to be executed on massive data resources such as the TUH EEG Corpus . We use TensorFlow  as the core machine learning library for our deep learning systems, and routinely employ multiple GPUs to accelerate the training process. Reproducible results are essential to machine learning research. Reproducibility in this context means the ability to replicate an existing experiment – performance metrics such as error rates should be identical and floating-point calculations should match closely. Three examples of ways we typically expect an experiment to be replicable are: (1) The same job run on the same processor should produce the same results each time it is run. (2) A job run on a CPU and GPU should producemore »