skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Comparing the behavior of OpenMP Implementations with various Applications on two different Fujitsu A64FX platforms
The development of the A64FX processor by Fujitsu has been a massive innovation in vectorized processors and led to Fugaku: the current world’s fastest supercomputer. We use a variety of tools to analyze the behavior and performance of several OpenMP applications with different compilers, and how these applications scale on the different A64FX processors on clusters at Stony Brook University and RIKEN.  more » « less
Award ID(s):
1927880
PAR ID:
10292745
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
PEARC '21: Practice and Experience in Advanced Research Computing
Page Range / eLocation ID:
1 to 4
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The landscape of high performance computing (HPC) has witnessed exponential growth in processor diversity, architectural complexity, and performance scalability. With an ever-increasing demand for faster and more efficient computing solutions to address an array of scientific, engineering, and societal challenges, the selection of processors for specific applications becomes paramount. Achieving optimal performance requires a deep understanding of how diverse processors interact with diverse workloads, making benchmarking a fundamental practice in the field of HPC. Here, we present preliminary results observed over such benchmarks and applications and a comparison of Intel Sapphire Rapids and Skylake-X, AMD Milan, and Fujitsu A64FX processors in terms of runtime performance, memory bandwidth utilization, and energy consumption. The examples focus specifically on the Sapphire Rapids processor with and without high-bandwidth memory (HBM). An additional case study reports the performance gains from using Intel’s Advanced Matrix Extensions (AMX) instructions, and how they along with HBM can be leveraged to accelerate AI workloads. These initial results aim to give a rough comparison of the processors rather than a detailed analysis and should prove timely and relevant for researchers who may be interested in using Sapphire Rapids for their scientific workloads. 
    more » « less
  2. We present a look at Ookami, a project providing community access to a testbed supercomputer with the ARM-based A64FX processors developed by a collaboration between RIKEN and Fujitsu and deployed in the Japanese supercomputer Fugaku. We provide an overview of the project and details of the hardware, and describe the user base and education/training program. We present highlights from previous performance studies of two astrophysical simulation codes and present a strong scaling study of a full 3D supernova simulation as an example of the machine’s capability. 
    more » « less
  3. Arm HPC has succeeded in scaling up in the supercomputing space with the deployment of systems like RIKEN’s Fugaku supercomputer and Sandia’s Astra cluster. At the same time, the onboarding of new users to the Arm HPC ecosystem has never been more complex due to an overabundance of compilers, libraries, and build options for tools and applications. This work investigates one particular method to ease the integration of new users into the space of Arm HPC through the use of Open OnDemand to provide a consistent and easy-to-use front-end for Georgia Tech’s A64FX cluster, Octavius. We detail the motivations for this deployment as well as the potential pitfalls in integrating with an Arm A64FX environment. User-motived applications that incorporate the interactive usage of virtual desktops and Jupyter notebooks are discussed as motivating user workflows, and we provide some context on how future deployments might look with combined Arm and Open OnDemand integration. 
    more » « less
  4. SRAM-based FPGAs are frequently used for critical functions in space applications. Soft processors implemented within these FPGAs are often needed to satisfy the mission requirements. The open ISA, RISC-V, has allowed for the development of a wide range of open source processors. Like all SRAM-based FPGA digital designs, these soft processors are susceptible to SEUs. This paper presents an investigation of the performances and relative SEU sensitivity of a selection of newly available open source RISC-V processors. Utilizing dynamic partial reconfiguration, this novel automatic test equipment rapidly deployed different implementations and evaluated SEU sensitivity through fault injection. Using BYU’s new SpyDrNet tools, fine-grain TMR was also applied to each processor with results ranging from a 20× to 500× reduction in sensitivity. 
    more » « less
  5. null (Ed.)
    Ookami [3] is a computer technology testbed supported by the United States National Science Foundation. It provides researchers with access to the A64FX processor developed by Fujitsu [17] in collaboration with RIKΞN [35, 37] for the Japanese path to exascale computing, as deployed in Fugaku [36], the fastest computer in the world [34]. By focusing on crucial architectural details, the ARM-based, multi-core, 512-bit SIMD-vector processor with ultrahigh-bandwidth memory promises to retain familiar and successful programming models while achieving very high performance for a wide range of applications. We review relevant technology and system details, and the main body of the paper focuses on initial experiences with the hardware and software ecosystem for micro-benchmarks, mini-apps, and full applications, and starts to answer questions about where such technologies fit into the NSF ecosystem. 
    more » « less