Search for: All records

Award ID contains: 1730488

« Prev Next »

Total Resources

20

Resource Type
Conference Paper

17

Conference Proceeding

0

Dataset

0

Journal Article

3

Workshop Report

0

Availability
Full Text / Resource Available

20

Citation Only

0

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Apollo:: An ML-assisted Real-Time Storage Resource Observer

https://doi.org/10.1145/3431379.3460640

Rajesh, Neeraj ; Devarajan, Hariharan ; Garcia, Jaime Cernuda ; Bateman, Keith ; Logan, Luke ; Ye, Jie ; Kougkas, Anthony ; Sun, Xian-He ( June 2021 , The 30th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '21))

Applications and middleware services, such as data placement engines, I/O scheduling, and prefetching engines, require low-latency access to telemetry data in order to make optimal decisions. However, typical monitoring services store their telemetry data in a database in order to allow applications to query them, resulting in significant latency penalties. This work presents Apollo: a low-latency monitoring service that aims to provide applications and middleware libraries with direct access to relational telemetry data. Monitoring the system can create interference and overhead, slowing down raw performance of the resources for the job. However, having a current view of the system can aid middleware services in making more optimal decisions which can ultimately improve the overall performance. Apollo has been designed from the ground up to provide low latency, using Publish–Subscribe (Pub-Sub) semantics, and low overhead, using adaptive intervals in order to change the length of time between polling the resource for telemetry data and machine learning in order to predict changes to the telemetry data between actual resource polling. This work also provides some high level abstractions called I/O curators, which can further aid middleware libraries and applications to make optimal decisions. Evaluations showcase that Apollo can achieve sub-millisecond latency for acquiring complex insights with a memory overhead of ~57MB and CPU overhead being only 7% more than existing state-of-the-art systems.
more » « less
Full Text Available
DLIO: A Data-Centric Benchmark for Scientific Deep Learning Applications

https://doi.org/10.1109/CCGrid51090.2021.00018

Devarajan, Hariharan ; Zheng, Huihuo ; Kougkas, Anthony ; Sun, Xian-He ; Vishwanath, Venkatram ( May 2021 , IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet Computing (CCGrid))
null (Ed.)
Deep learning has been shown as a successful method for various tasks, and its popularity results in numerous open-source deep learning software tools. Deep learning has been applied to a broad spectrum of scientific domains such as cosmology, particle physics, computer vision, fusion, and astrophysics. Scientists have performed a great deal of work to optimize the computational performance of deep learning frameworks. However, the same cannot be said for I/O performance. As deep learning algorithms rely on big-data volume and variety to effectively train neural networks accurately, I/O is a significant bottleneck on large-scale distributed deep learning training. This study aims to provide a detailed investigation of the I/O behavior of various scientific deep learning workloads running on the Theta supercomputer at Argonne Leadership Computing Facility. In this paper, we present DLIO, a novel representative benchmark suite built based on the I/O profiling of the selected workloads. DLIO can be utilized to accurately emulate the I/O behavior of modern scientific deep learning applications. Using DLIO, application developers and system software solution architects can identify potential I/O bottlenecks in their applications and guide optimizations to boost the I/O performance leading to lower training times by up to 6.7x.
more » « less
Full Text Available
Distributed Virtual Time-Based Synchronization for Simulation of Cyber-Physical Systems

https://doi.org/10.1145/3446237

Hannon, Christopher ; Yan, Jiaqi ; Jin, Dong ( April 2021 , ACM Transactions on Modeling and Computer Simulation)
null (Ed.)
Our world today increasingly relies on the orchestration of digital and physical systems to ensure the successful operations of many complex and critical infrastructures. Simulation-based testbeds are useful tools for engineering those cyber-physical systems and evaluating their efficiency, security, and resilience. In this article, we present a cyber-physical system testing platform combining distributed physical computing and networking hardware and simulation models. A core component is the distributed virtual time system that enables the efficient synchronization of virtual clocks among distributed embedded Linux devices. Virtual clocks also enable high-fidelity experimentation by interrupting real and emulated cyber-physical applications to inject offline simulation data. We design and implement two modes of the distributed virtual time: periodic mode for scheduling repetitive events like sensor device measurements, and dynamic mode for on-demand interrupt-based synchronization. We also analyze the performance of both approaches to synchronization including overhead, accuracy, and error introduced from each approach. By interconnecting the embedded devices’ general purpose IO pins, they can coordinate and synchronize with low overhead, under 50 microseconds for eight processes across four embedded Linux devices. Finally, we demonstrate the usability of our testbed and the differences between both approaches in a power grid control application.
more » « less
Full Text Available
HCL: Distributing Parallel Data Structures in Extreme Scales

https://doi.org/10.1109/CLUSTER49012.2020.00035

Devarajan, H ; Kougkas, A ; Bateman, K ; Sun, X.-H. ( September 2020 , IEEE International Conference on Cluster Computing (CLUSTER))

Full Text Available
Parallel Simulation of Quantum Key Distribution Networks

https://doi.org/10.1145/3384441.3395988

Wu, Xiaoliang ; Zhang, Bo ; Jin, Dong ( June 2020 , 020 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (PADS))

Full Text Available
HFetch: Hierarchical Data Prefetching for Scientific Workflows in Multi-Tiered Storage Environments

https://doi.org/10.1109/IPDPS47924.2020.00017

Hariharan Devarajan, Anthony Kougkas ( May 2020 , 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS))

Full Text Available
HCompress: Hierarchical Data Compression for Multi-Tiered Storage Environments

https://doi.org/10.1109/IPDPS47924.2020.00064

Hariharan Devarajan, Anthony Kougkas ( May 2020 , 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS))

Full Text Available
I/O Acceleration via Multi-Tiered Data Buffering and Prefetching

https://doi.org/10.1007/s11390-020-9781-1

Kougkas, Anthony ; Devarajan, Hariharan ; Sun, Xian-He ( January 2020 , Journal of Computer Science and Technology)

Full Text Available
HReplica: A Dynamic Data Replication Engine with Adaptive Compression for Multi-Tiered Storage

https://doi.org/10.1109/BIGDATA50022.2020.9378167

Devarajan, Hariharan ; Kougkas, Anthony ; Sun, Xian-He ( January 2020 , 2020 IEEE International Conference on Big Data (Big Data))

Full Text Available
Cyber-Resilience Enhancement of PMU Networks Using Software-Defined Networking

https://doi.org/10.1109/SMARTGRIDCOMM47815.2020.9303004

Qu, Yanfeng ; Chen, Gong ; Liu, Xin ; Yan, Jiaqi ; Chen, Bo ; Jin, Dong ( January 2020 , 2020 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm))
null (Ed.)
Full Text Available

« Prev Next »