NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Continuous HPC Performance Monitoring: Can It Run Without Affecting User Jobs?

https://doi.org/10.1145/3708035.3736036

Joseph, Jyothismaria; Simakov, Nikolay A (July 2025, ACM)

High-performance computing (HPC) resources are used in a wide range of scientic and engineering calculations. These resources have high initial and running costs. Thus, their optimal performance is crucial. There are a number of strategies to ensure the optimal state. One of them is continuous performance monitoring, where a set of applications and input parameters are executed regularly to identify performance issues proactively. Some sites hesitate to use such a strategy as it takes away the CPU cycles from actual users. The goal of this work is to identify node availability, both size- and time-wise, on busy HPC systems. Such availability spots can be used to tailor test jobs to minimize user impact. Two systems were analyzed: small - 118 nodes from the Center for Computational Research at the University at Bualo and large - 1,160 nodes from the Texas Advanced Computing Center. It was found that for days with 90% utilization and above, there are plenty of opportunities for test jobs. For example, on a small cluster, 8 nodes for 30 minutes are available for an average of 2.3 hours throughout the day. That is 9.6% of the day the scheduler has the opportunity to schedule such a job. On a large system, 32 nodes for 30 minutes were available on average 9.2 hours a day (or 38% of day). Thus, there is a space for test jobs, but it is not evident that the scheduler can benet from it, and a proper strategy must be used, for example, by lowering test job priorities.
more » « less
Free, publicly-accessible full text available July 18, 2026
Predictive Modeling of HPC Job Queue Times: Improving User Decision-Making and Resource Utilization

https://doi.org/10.1145/3708035.3736067

Gaikwad, Bipin; Simakov, Nikolay A; Furlani, Thomas; White, Joseph Patrick; Patra, Abani (July 2025, ACM)

This work presents a framework for estimating job wait times in High-Performance Computing (HPC) scheduling queues, leverag- ing historical job scheduling data and real-time system metrics. Using machine learning techniques, specifically Random Forest and Multi-Layer Perceptron (MLP) models, we demonstrate high accuracy in predicting wait times, achieving 94.2% reliability within a 10-minute error margin. The framework incorporates key fea- tures such as requested resources, queue occupancy, and system utilization, with ablation studies revealing the significance of these features. Additionally, the framework offers users wait time esti- mates for different resource configurations, enabling them to select optimal resources, reduce delays, and accelerate computational workloads. Our approach provides valuable insights for both users and administrators to optimize job scheduling, contributing to more efficient resource management and faster time to scientific results.
more » « less
Free, publicly-accessible full text available July 18, 2026
HPCMod: Agent-Based Modeling Framework for Modeling Users on High-Performance Computing Resources

https://doi.org/10.54941/ahfe1005658

Simakov, Nikolay A (December 2024, Accelerating Open Access Science in Human Factors Engineering and Human-Centered Computing)

High-performance computing (HPC) resources are used for compute-demanding calculations in various fields of science and engineering. They are large computational facilities utilized by many users simultaneously. High utilization often leads to high waiting times. Simulating users' behavior on such a system can help with future system design, develop user interventions, and ultimately improve the user’s experience and resource utilization. Here, we present HPCMod, an Agent-Based Modeling Framework for Modeling Users on HPC Resources. The key concept of the framework is the representation of the user's computational needs: the user project is represented as a collection of possibly dependent compute tasks. Each task can be executed as a single compute job or a series of jobs, depending on the task size. Some tasks can be too big to be executed in one chunk; such a situation often occurs during molecular dynamics simulation. There are multiple ways in which tasks can be split into jobs, and users will make their decisions based on previous experience, application parallel scalability, and available resources. For example, a user's compute task requires 32 node hours; it can be executed in multiple ways: a single 32-hour job on one node, two sequential 16-hour jobs on one node, one 16-hour job on two nodes, and so on. In the HPCMod, we implemented three models: 1) historical replay of compute jobs, 2) simulation of reconstituted compute tasks using historical job sizes, and 3) adaptive compute tasks splitting where users can modify jobs parameters given available resources till the execution of the next job in line. The framework was tested on a ten-node test system and a larger 1,736-node system modeled after a portion of TACC Stampede-2. The HPC resource model implements a first in first out (FIFO) scheduler with backfill scheduling. The initial results showed that on a tiny system, adaptive task-splitting is beneficial for the user but leads to a larger number of jobs. On a large system, the adaptive task-splitting was also very beneficial, decreasing waiting times for users using this strategy almost two times; however, other users got a 5% increase in their wait time. Further investigation is needed as the current task reconstitution algorithm is deterministic and does not allow quantification of job recombination uncertainties. The Julia-based implementation is fast: five years of historic workflow consisting of a million jobs and a one-hour stepping took around three minutes.
more » « less
Full Text Available
The AmpereOne A192-32X in Perspective: Benchmarking a New Standard

https://doi.org/10.1145/3703001.3724384

Carlson, David; Simakov, Nikolay; Ristow_Hadlich, Rodrigo; Curtis, Anthony; Martin, Joshua; Verma, Gaurav; Chheda, Smeet; Coskun, Firat; Gonzalez, Raul; Wood, Daniel; et al (February 2025, ACM)

This study presents a comprehensive benchmarking analysis of the Arm-based AmpereOne A192-32X CPU, a high-performance but low power processor designed for cloud-native workloads characterized by high core occupancy, imperfectly-vectorized or even pure scalar software, limited need for high floating-point performance, and, increasingly, AI inference. These traits also characterize much of academic research computing. Hence a thorough investigation of this novel CPU seeking to characterize its strengths and weaknesses on academic workloads, including traditional HPC codes for which it was not designed, will shed light on its relevance in a research setting. We report comparative analyses with contemporary CPUs (Intel Sapphire Rapids, AMD EPYC, NVIDIA Grace-Grace) and illustrate AmpereOne’s architectural advantages in handling parallel workloads and optimizing power consumption. The CPUs are compared in terms of performance and power consumption using a wide range of applications covering different workloads and disciplines.
more » « less
Free, publicly-accessible full text available February 19, 2026
The Data Analytics Framework for XDMoD

https://doi.org/10.1007/s42979-024-02789-2

Weeden, Aaron; White, Joseph P; DeLeon, Robert L; Rathsam, Ryan; Simakov, Nikolay A; Saeli, Conner; Furlani, Thomas R (June 2024, SN Computer Science)

Full Text Available
First Impressions of the Sapphire Rapids Processor with HBM for Scientific Workloads

https://doi.org/10.1007/s42979-024-02958-3

Siegmann, Eva; Harrison, Robert J; Carlson, David; Chheda, Smeet; Curtis, Anthony; Coskun, Firat; Gonzalez, Raul; Wood, Daniel; Simakov, Nikolay A (June 2024, SN Computer Science)

Abstract The landscape of high performance computing (HPC) has witnessed exponential growth in processor diversity, architectural complexity, and performance scalability. With an ever-increasing demand for faster and more efficient computing solutions to address an array of scientific, engineering, and societal challenges, the selection of processors for specific applications becomes paramount. Achieving optimal performance requires a deep understanding of how diverse processors interact with diverse workloads, making benchmarking a fundamental practice in the field of HPC. Here, we present preliminary results observed over such benchmarks and applications and a comparison of Intel Sapphire Rapids and Skylake-X, AMD Milan, and Fujitsu A64FX processors in terms of runtime performance, memory bandwidth utilization, and energy consumption. The examples focus specifically on the Sapphire Rapids processor with and without high-bandwidth memory (HBM). An additional case study reports the performance gains from using Intel’s Advanced Matrix Extensions (AMX) instructions, and how they along with HBM can be leveraged to accelerate AI workloads. These initial results aim to give a rough comparison of the processors rather than a detailed analysis and should prove timely and relevant for researchers who may be interested in using Sapphire Rapids for their scientific workloads.
more » « less
Full Text Available
First Impressions of the NVIDIA Grace CPU Superchip and NVIDIA Grace Hopper Superchip for Scientific Workloads

https://doi.org/10.1145/3636480.3637097

Simakov, Nikolay A.; Jones, Matthew D.; Furlani, Thomas R.; Siegmann, Eva; Harrison, Robert J. (January 2024, ACM)

The engineering samples of the NVIDIA Grace CPU Superchip and NVIDIA Grace Hopper Superchips were tested using different benchmarks and scientific applications. The benchmarks include HPCC and HPCG. The real application-based benchmark includes AI-Benchmark-Alpha (a TensorFlow benchmark), Gromacs, OpenFOAM, and ROMS. The performance was compared to multiple Intel, AMD, ARM CPUs and several x86 with NVIDIA GPU systems. A brief energy efficiency estimate was performed based on TDP values. We found that in HPCC benchmark tests, the per-core performance of Grace is similar to or faster than AMD Milan cores, and the high core count often allows NVIDIA Grace CPU Superchip to have per-node performance similar to Intel Sapphire Rapids with High Bandwidth Memory: slower in matrix multiplication (by 17%) and FFT (by 6%), faster in Linpack (by 9%)). In scientific applications, the NVIDIA Grace CPU Superchip performance is slower by 6% to 18% in Gromacs, faster by 7% in OpenFOAM, and right between HBM and DDR modes of Intel Sapphire Rapids in ROMS. The combined CPU-GPU performance in Gromacs is significantly faster (by 20% to 117% faster) than any tested x86-NVIDIA GPU system. Overall, the new NVIDIA Grace Hopper Superchip and NVIDIA Grace CPU Superchip Superchip are high-performance and most likely energy-efficient solutions for HPC centers.
more » « less
Are we ready for broader adoption of ARM in the HPC community: Performance and Energy Efficiency Analysis of Benchmarks and Applications Executed on High-End ARM Systems

https://doi.org/10.1145/3581576.3581618

Simakov, Nikolay A.; Deleon, Robert L.; White, Joseph P.; Jones, Matthew D.; Furlani, Thomas R.; Siegmann, Eva; Harrison, Robert J. (February 2023, Proceedings of the HPC Asia 2023 Workshops (HPC Asia '23 Workshops))

Full Text Available
Developing Accurate Slurm Simulator

https://doi.org/10.1145/3491418.3535178

Simakov, Nikolay A.; Deleon, Robert L.; Lin, Yuqing; Hoffmann, Phillip S.; Mathias, William R. (July 2022, PEARC '22: Practice and Experience in Advanced Research Computing)

Full Text Available
A64FX performance: experience on Ookami

https://doi.org/10.1109/Cluster48925.2021.00106

Bari, Md Abdullah; Chapman, Barbara; Curtis, Anthony; Harrison, Robert J.; Siegmann, Eva; Simakov, Nikolay A.; Jones, Matthew D. (September 2021, 2021 IEEE International Conference on Cluster Computing (CLUSTER))

Full Text Available

« Prev Next »

Search for: All records