skip to main content

Attention:

The NSF Public Access Repository (PAR) system and access will be unavailable from 11:00 PM ET on Thursday, January 16 until 2:00 AM ET on Friday, January 17 due to maintenance. We apologize for the inconvenience.


Title: Opportunities for enhancing MLCommons efforts while leveraging insights from educational MLCommons earthquake benchmarks efforts

MLCommons is an effort to develop and improve the artificial intelligence (AI) ecosystem through benchmarks, public data sets, and research. It consists of members from start-ups, leading companies, academics, and non-profits from around the world. The goal is to make machine learning better for everyone. In order to increase participation by others, educational institutions provide valuable opportunities for engagement. In this article, we identify numerous insights obtained from different viewpoints as part of efforts to utilize high-performance computing (HPC) big data systems in existing education while developing and conducting science benchmarks for earthquake prediction. As this activity was conducted across multiple educational efforts, we project if and how it is possible to make such efforts available on a wider scale. This includes the integration of sophisticated benchmarks into courses and research activities at universities, exposing the students and researchers to topics that are otherwise typically not sufficiently covered in current course curricula as we witnessed from our practical experience across multiple organizations. As such, we have outlined the many lessons we learned throughout these efforts, culminating in the need forbenchmark carpentryfor scientists using advanced computational resources. The article also presents the analysis of an earthquake prediction code benchmark while focusing on the accuracy of the results and not only on the runtime; notedly, this benchmark was created as a result of our lessons learned. Energy traces were produced throughout these benchmarks, which are vital to analyzing the power expenditure within HPC environments. Additionally, one of the insights is that in the short time of the project with limited student availability, the activity was only possible by utilizing a benchmark runtime pipeline while developing and using software to generate jobs from the permutation of hyperparameters automatically. It integrates a templated job management framework for executing tasks and experiments based on hyperparameters while leveraging hybrid compute resources available at different institutions. The software is part of a collection calledcloudmeshwith its newly developed components, cloudmesh-ee (experiment executor) and cloudmesh-cc (compute coordinator).

 
more » « less
Award ID(s):
2210266 2204115 2200409 2151597
PAR ID:
10473591
Author(s) / Creator(s):
; ; ; ; ; ;
Publisher / Repository:
Frontiers
Date Published:
Journal Name:
Frontiers in High Performance Computing
Volume:
1
ISSN:
2813-7337
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Jetstream2 will be a category I production cloud resource that is part of the National Science Foundation’s Innovative HPC Program. The project’s aim is to accelerate science and engineering by providing “on-demand” programmable infrastructure built around a core system at Indiana University and four regional sites. Jetstream2 is an evolution of the Jetstream platform, which functions primarily as an Infrastructure-as-a-Service cloud. The lessons learned in cloud architecture, distributed storage, and container orchestration have inspired changes in both hardware and software for Jetstream2. These lessons have wide implications as institutions converge HPC and cloud technology while building on prior work when deploying their own cloud environments. Jetstream2’s next-generation hardware, robust open-source software, and enhanced virtualization will provide a significant platform to further cloud adoption within the US research and education communities. 
    more » « less
  2. Summary

    High performance computing (HPC) has led to remarkable advances in science and engineering and has become an indispensable tool for research. Unfortunately, HPC use and adoption by many researchers is often hindered by the complex way these resources are accessed. Indeed, while the web has become the dominant access mechanism for remote computing services in virtually every computing area, HPC is a notable exception. Open OnDemand is an open source project negating this trend by providing web‐based access to HPC resources (https://openondemand.org). This article describes the challenges to adoption and other lessons learned over the 3‐year project that may be relevant to other science gateway projects. We end with a description of future plans the project team has during the Open OnDemand 2.0 project including specific developments in machine learning and GPU monitoring.

     
    more » « less
  3. While both the database and high-performance computing (HPC) communities utilize lossless compression methods to minimize floating-point data size, a disconnect persists between them. Each community designs and assesses methods in a domain-specific manner, making it unclear if HPC compression techniques can benefit database applications or vice versa. With the HPC community increasingly leaning towards in-situ analysis and visualization, more floating-point data from scientific simulations are being stored in databases like Key-Value Stores and queried using in-memory retrieval paradigms. This trend underscores the urgent need for a collective study of these compression methods' strengths and limitations, not only based on their performance in compressing data from various domains but also on their runtime characteristics. Our study extensively evaluates the performance of eight CPU-based and five GPU-based compression methods developed by both communities, using 33 real-world datasets assembled in the Floating-point Compressor Benchmark (FCBench). Additionally, we utilize the roofline model to profile their runtime bottlenecks. Our goal is to offer insights into these compression methods that could assist researchers in selecting existing methods or developing new ones for integrated database and HPC applications.

     
    more » « less
  4. To enable the sustainable use of their ocean resources, capacity for ocean science and observations is important for every coastal nation. In many developing areas of the world, capability for ocean science and observations is not yet adequate to meet management needs. International organizations have employed a variety of capacity development approaches to assist developing countries in building self-sustaining ocean science and observational communities. This article describes the lessons learned from visiting scientist programs conducted for more than a decade by the Partnership for Observation of the Global Ocean (POGO) and the Scientific Committee on Oceanic Research (SCOR) that dispatched ocean scientists to developing countries to train hundreds of individuals in a variety of ocean science and observation topics and techniques. From these programs, SCOR and POGO have learned that training in-country has multiple benefits to trainees, host institutions, and trainers, benefits that are not achievable when students leave their countries. These benefits include more cost-effective training on issues relevant to the host institutions using locally available technology, as well as the ability to reach a large number of trainees. Lessons learned from the POGO and SCOR programs can be used to inform the future capacity-development activities of POGO and SCOR, as well as other organizations, to improve, enhance, and expand the use of in-country training and mentoring. Such approaches could contribute to the capacity development efforts of the UN Decade of Ocean Science for Sustainable Development. 
    more » « less
  5. Modern High Performance Computing (HPC) systems are built with innovative system architectures and novel programming models to further push the speed limit of computing. The increased complexity poses challenges for performance portability and performance evaluation. The Standard Performance Evaluation Corporation (SPEC) has a long history of producing industry-standard benchmarks for modern computer systems. SPEC’s newly released SPEChpc 2021 benchmark suites, developed by the High Performance Group, are a bold attempt to provide a fair and objective benchmarking tool designed for stateof-the-art HPC systems. With the support of multiple host and accelerator programming models, the suites are portable across both homogeneous and heterogeneous architectures. Different workloads are developed to fit system sizes ranging from a few compute nodes to a few hundred compute nodes. In this work we present our first experiences in performance benchmarking the new SPEChpc2021 suites and evaluate their portability and basic performance characteristics on various popular and emerging HPC architectures, including x86 CPU, NVIDIA GPU, and AMD GPU. This study provides a first-hand experience of executing the SPEChpc 2021 suites at scale on production HPC systems, discusses real-world use cases, and serves as an initial guideline for using the benchmark suites. 
    more » « less