skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Ookami: An A64FX Computing Resource
We present a look at Ookami, a project providing community access to a testbed supercomputer with the ARM-based A64FX processors developed by a collaboration between RIKEN and Fujitsu and deployed in the Japanese supercomputer Fugaku. We provide an overview of the project and details of the hardware, and describe the user base and education/training program. We present highlights from previous performance studies of two astrophysical simulation codes and present a strong scaling study of a full 3D supernova simulation as an example of the machine’s capability.  more » « less
Award ID(s):
1927880
PAR ID:
10534367
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ;
Publisher / Repository:
J.Phys.Conf.Ser. 2742 (2024) 1, 012019
Date Published:
Journal Name:
Journal of Physics: Conference Series
Volume:
2742
Issue:
1
ISSN:
1742-6588
Page Range / eLocation ID:
012019
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Tornadoes remain an active subject of observational and numerical research due to the damage and fatalities they cause worldwide as well as poor understanding of their behavior, such as what processes result in their genesis and what determines their longevity and morphology. A numerical model executed on a supercomputer run at high resolution can serve as a powerful tool for exploring the rapidly evolving tornado-scale features within a simulated storm, but saving large amounts data for meaningful analysis can result in unacceptably slow model performance, an unwieldy amount of saved data, and saved data spread across millions of files. In this paper, a system for efficiently saving and managing hundreds of terabytes of compressed model output is described to support a supercomputer simulation of a tornadic supercell thunderstorm. The challenges of managing a simulation spanning over a quarter-trillion grid volumes across the Blue Waters supercomputer are also described. The simulated supercell produces a long-track EF5 tornado, and the near-tornado environment is described during tornadogenesis, where single upward-growing vortex undergoes several vortex mergers before transitioning into a multiple-vortex tornado. 
    more » « less
  2. Quantum circuit simulations are critical for evaluating quantum algorithms and machines. However, the number of state amplitudes required for full simulation increases exponentially with the number of qubits. In this study, we leverage data compression to reduce memory requirements, trading computation time and fidelity for memory space. Specifically, we develop a hybrid solution by combining the lossless compression and our tailored lossy compression method with adaptive error bounds at each timestep of the simulation. Our approach optimizes for compression speed and makes sure that errors due to lossy compression are uncorrelated, an important property for comparing simulation output with physical machines. Experiments show that our approach reduces the memory requirement of simulating the 61-qubit Grover's search algorithm from 32 exabytes to 768 terabytes of memory on Argonne's Theta supercomputer using 4,096 nodes. The results suggest that our techniques can increase the simulation size by 2~16 qubits for general quantum circuits. 
    more » « less
  3. We introduce an ensemble of artificial intelligence models for gravitational wave detection that we trained in the Summit supercomputer using 32 nodes, equivalent to 192 NVIDIA V100 GPUs, within 2 h. Once fully trained, we optimized these models for accelerated inference using NVIDIA TensorRT . We deployed our inference-optimized AI ensemble in the ThetaGPU supercomputer at Argonne Leadership Computer Facility to conduct distributed inference. Using the entire ThetaGPU supercomputer, consisting of 20 nodes each of which has 8 NVIDIA A100 Tensor Core GPUs and 2 AMD Rome CPUs, our NVIDIA TensorRT -optimized AI ensemble processed an entire month of advanced LIGO data (including Hanford and Livingston data streams) within 50 s. Our inference-optimized AI ensemble retains the same sensitivity of traditional AI models, namely, it identifies all known binary black hole mergers previously identified in this advanced LIGO dataset and reports no misclassifications, while also providing a 3 X inference speedup compared to traditional artificial intelligence models. We used time slides to quantify the performance of our AI ensemble to process up to 5 years worth of advanced LIGO data. In this synthetically enhanced dataset, our AI ensemble reports an average of one misclassification for every month of searched advanced LIGO data. We also present the receiver operating characteristic curve of our AI ensemble using this 5 year long advanced LIGO dataset. This approach provides the required tools to conduct accelerated, AI-driven gravitational wave detection at scale. 
    more » « less
  4. null (Ed.)
    Python has become a widely used programming language for research, not only for small one-off analyses, but also for complex application pipelines running at supercomputer- scale. Modern parallel programming frameworks for Python present users with a more granular unit of management than traditional Unix processes and batch submissions: the Python function. We review the challenges involved in running native Python functions at scale, and present techniques for dynamically determining a minimal set of dependencies and for assembling a lightweight function monitor (LFM) that captures the software environment and manages resources at the granularity of single functions. We evaluate these techniques in a range of environ- ments, from campus cluster to supercomputer, and show that our advanced dependency management planning and dynamic re- source management methods provide superior performance and utilization relative to coarser-grained management approaches, achieving several-fold decrease in execution time for several large Python applications. 
    more » « less
  5. Data physicalizations (3D printed terrain models, anatomical scans, or even abstract data) can naturally engage both the visual and haptic senses in ways that are difficult or impossible to do with traditional planar touch screens and even immersive digital displays. Yet, the rigid 3D physicalizations produced with today's most common 3D printers are fundamentally limited for data exploration and querying tasks that require dynamic input (e.g., touch sensing) and output (e.g., animation), functions that are easily handled with digital displays. We introduce a novel style of hybrid virtual + physical visualization designed specifically to support interactive data exploration tasks. Working toward a "best of both worlds" solution, our approach fuses immersive AR, physical 3D data printouts, and touch sensing through the physicalization. We demonstrate that this solution can support three of the most common spatial data querying interactions used in scientific visualization (streamline seeding, dynamic cutting places, and world-in-miniature visualization). Finally, we present quantitative performance data and describe a first application to exploratory visualization of an actively studied supercomputer climate simulation data with feedback from domain scientists. 
    more » « less