skip to main content


This content will become publicly available on January 31, 2025

Title: A Detailed Historical and Statistical Analysis of the Influence of Hardware Artifacts on SPEC Integer Benchmark Performance
The Standard Performance Evaluation Corporation (SPEC) CPU benchmark has been widely used as a measure of computing performance for decades. The SPEC is an industry-standardized, CPU-intensive benchmark suite and the collective data provide a proxy for the history of worldwide CPU and system performance. Past efforts have not provided or enabled answers to questions such as, how has the SPEC benchmark suite evolved empirically over time and what micro-architecture artifacts have had the most influence on performance? - have any micro-benchmarks within the suite had undue influence on the results and comparisons among the codes? - can the answers to these questions provide insights to the future of computer system performance? To answer these questions, we detail our historical and statistical analysis of specific hardware artifacts (clock frequencies, core counts, etc.) on the performance of the SPEC benchmarks since 1995. We discuss in detail several methods to normalize across benchmark evolutions. We perform both isolated and collective sensitivity analyses for various hardware artifacts and we identify one benchmark (libquantum) that had somewhat undue influence on performance outcomes. We also present the use of SPEC data to predict future performance.  more » « less
Award ID(s):
1838271
NSF-PAR ID:
10488761
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
IEEE
Date Published:
Journal Name:
IEEE transactions on computers
ISSN:
0018-9340
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The prevalence of scientific workflows with high computational demands calls for their execution on various distributed computing platforms, including large-scale leadership-class high-performance computing (HPC) clusters. To handle the deployment, monitoring, and optimization of workflow executions, many workflow systems have been developed over the past decade. There is a need for workflow benchmarks that can be used to evaluate the performance of workflow systems on current and future software stacks and hardware platforms. We present a generator of realistic workflow benchmark specifications that can be translated into benchmark code to be executed with current workflow systems. Our approach generates workflow tasks with arbitrary performance characteristics (CPU, memory, and I/O usage) and with realistic task dependency structures based on those seen in production workflows. We present experimental results that show that our approach generates benchmarks that are representative of production workflows, and conduct a case study to demonstrate the use and usefulness of our generated benchmarks to evaluate the performance of workflow systems under different configuration scenarios. 
    more » « less
  2. The recent introduction of Unified Virtual Memory (UVM) in GPUs offers a new programming model that allows GPUs and CPUs to share the same virtual memory space, which shifts the complex memory management from programmers to GPU driver/ hardware and enables kernel execution even when memory is oversubscribed. Meanwhile, UVM may also incur considerable performance overhead due to tracking and data migration along with special handling of page faults and page table walk. As UVM is attracting significant attention from the research community to develop innovative solutions to these problems, in this paper, we propose a comprehensive UVM benchmark suite named UVMBench to facilitate future research on this important topic. The proposed UVMBench consists of 32 representative benchmarks from a wide range of application domains. The suite also features unified programming implementation and diverse memory access patterns across benchmarks, thus allowing thorough evaluation and comparison with current state-of-the-art. A set of experiments have been conducted on real GPUs to verify and analyze the benchmark suite behaviors under various scenarios. 
    more » « less
  3. Modern High Performance Computing (HPC) systems are built with innovative system architectures and novel programming models to further push the speed limit of computing. The increased complexity poses challenges for performance portability and performance evaluation. The Standard Performance Evaluation Corporation (SPEC) has a long history of producing industry-standard benchmarks for modern computer systems. SPEC’s newly released SPEChpc 2021 benchmark suites, developed by the High Performance Group, are a bold attempt to provide a fair and objective benchmarking tool designed for stateof-the-art HPC systems. With the support of multiple host and accelerator programming models, the suites are portable across both homogeneous and heterogeneous architectures. Different workloads are developed to fit system sizes ranging from a few compute nodes to a few hundred compute nodes. In this work we present our first experiences in performance benchmarking the new SPEChpc2021 suites and evaluate their portability and basic performance characteristics on various popular and emerging HPC architectures, including x86 CPU, NVIDIA GPU, and AMD GPU. This study provides a first-hand experience of executing the SPEChpc 2021 suites at scale on production HPC systems, discusses real-world use cases, and serves as an initial guideline for using the benchmark suites. 
    more » « less
  4. null (Ed.)
    Recent scientific computing increasingly relies on multi-scale multi-physics simulations to enhance predictive capabilities by replacing a suite of stand-alone simulation codes that independently simulate various physical phenomena. Inevitably, multi-physics simulation demands high performance computing (HPC) through advanced hardware and software accelerating due to its intensive computing workload and run-time communication needs. Thus, its research has become a hotspot across different disciplines. However, it is observed that most benchmarks used in the evaluation of corresponding work are through commercial or in-house codes. Then, the lack of accessible open-source multi-physics benchmark suites has presented a challenge in uniformly evaluating simulation performance across related disciplines. This work proposes the first open-source based benchmark suite with 12 selected benchmarks for research in multi-physics simulation, the Clarkson Open-Source Multi-physics Benchmark Suite (COMBS). Multiple metrics have been gathered for these benchmarks, such as instructions per second and memory usage. Also provided are build and benchmark scripts to improve usability. Additionally, their source codes and installation guides are available for downloading through a github repository built by the authors. The selected benchmarks are from key applications of multi-physics simulation and highly cited publications. It is believed that this benchmark suite will facilitate to harness the full potential of HPC research in the field of multi-physics simulation. 
    more » « less
  5. The engineering samples of the NVIDIA Grace CPU Superchip and NVIDIA Grace Hopper Superchips were tested using different benchmarks and scientific applications. The benchmarks include HPCC and HPCG. The real application-based benchmark includes AI-Benchmark-Alpha (a TensorFlow benchmark), Gromacs, OpenFOAM, and ROMS. The performance was compared to multiple Intel, AMD, ARM CPUs and several x86 with NVIDIA GPU systems. A brief energy efficiency estimate was performed based on TDP values. We found that in HPCC benchmark tests, the per-core performance of Grace is similar to or faster than AMD Milan cores, and the high core count often allows NVIDIA Grace CPU Superchip to have per-node performance similar to Intel Sapphire Rapids with High Bandwidth Memory: slower in matrix multiplication (by 17%) and FFT (by 6%), faster in Linpack (by 9%)). In scientific applications, the NVIDIA Grace CPU Superchip performance is slower by 6% to 18% in Gromacs, faster by 7% in OpenFOAM, and right between HBM and DDR modes of Intel Sapphire Rapids in ROMS. The combined CPU-GPU performance in Gromacs is significantly faster (by 20% to 117% faster) than any tested x86-NVIDIA GPU system. Overall, the new NVIDIA Grace Hopper Superchip and NVIDIA Grace CPU Superchip Superchip are high-performance and most likely energy-efficient solutions for HPC centers. 
    more » « less