skip to main content


Search for: All records

Award ID contains: 1730488

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Applications and middleware services, such as data placement engines, I/O scheduling, and prefetching engines, require low-latency access to telemetry data in order to make optimal decisions. However, typical monitoring services store their telemetry data in a database in order to allow applications to query them, resulting in significant latency penalties. This work presents Apollo: a low-latency monitoring service that aims to provide applications and middleware libraries with direct access to relational telemetry data. Monitoring the system can create interference and overhead, slowing down raw performance of the resources for the job. However, having a current view of the system can aid middleware services in making more optimal decisions which can ultimately improve the overall performance. Apollo has been designed from the ground up to provide low latency, using Publish–Subscribe (Pub-Sub) semantics, and low overhead, using adaptive intervals in order to change the length of time between polling the resource for telemetry data and machine learning in order to predict changes to the telemetry data between actual resource polling. This work also provides some high level abstractions called I/O curators, which can further aid middleware libraries and applications to make optimal decisions. Evaluations showcase that Apollo can achieve sub-millisecond latency for acquiring complex insights with a memory overhead of ~57MB and CPU overhead being only 7% more than existing state-of-the-art systems. 
    more » « less
  2. null (Ed.)
    Deep learning has been shown as a successful method for various tasks, and its popularity results in numerous open-source deep learning software tools. Deep learning has been applied to a broad spectrum of scientific domains such as cosmology, particle physics, computer vision, fusion, and astrophysics. Scientists have performed a great deal of work to optimize the computational performance of deep learning frameworks. However, the same cannot be said for I/O performance. As deep learning algorithms rely on big-data volume and variety to effectively train neural networks accurately, I/O is a significant bottleneck on large-scale distributed deep learning training. This study aims to provide a detailed investigation of the I/O behavior of various scientific deep learning workloads running on the Theta supercomputer at Argonne Leadership Computing Facility. In this paper, we present DLIO, a novel representative benchmark suite built based on the I/O profiling of the selected workloads. DLIO can be utilized to accurately emulate the I/O behavior of modern scientific deep learning applications. Using DLIO, application developers and system software solution architects can identify potential I/O bottlenecks in their applications and guide optimizations to boost the I/O performance leading to lower training times by up to 6.7x. 
    more » « less
  3. null (Ed.)
    Our world today increasingly relies on the orchestration of digital and physical systems to ensure the successful operations of many complex and critical infrastructures. Simulation-based testbeds are useful tools for engineering those cyber-physical systems and evaluating their efficiency, security, and resilience. In this article, we present a cyber-physical system testing platform combining distributed physical computing and networking hardware and simulation models. A core component is the distributed virtual time system that enables the efficient synchronization of virtual clocks among distributed embedded Linux devices. Virtual clocks also enable high-fidelity experimentation by interrupting real and emulated cyber-physical applications to inject offline simulation data. We design and implement two modes of the distributed virtual time: periodic mode for scheduling repetitive events like sensor device measurements, and dynamic mode for on-demand interrupt-based synchronization. We also analyze the performance of both approaches to synchronization including overhead, accuracy, and error introduced from each approach. By interconnecting the embedded devices’ general purpose IO pins, they can coordinate and synchronize with low overhead, under 50 microseconds for eight processes across four embedded Linux devices. Finally, we demonstrate the usability of our testbed and the differences between both approaches in a power grid control application. 
    more » « less
  4. null (Ed.)