NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

FAIR for AI: An interdisciplinary and international community building perspective

https://doi.org/10.1038/s41597-023-02298-6

Huerta, E. A.; Blaiszik, Ben; Brinson, L. Catherine; Bouchard, Kristofer E.; Diaz, Daniel; Doglioni, Caterina; Duarte, Javier M.; Emani, Murali; Foster, Ian; Fox, Geoffrey; et al (December 2023, Scientific Data)

Full Text Available
Opportunities for enhancing MLCommons efforts while leveraging insights from educational MLCommons earthquake benchmarks efforts

https://doi.org/10.3389/fhpcp.2023.1233877

von Laszewski, Gregor; Fleischer, J. P.; Knuuti, Robert; Fox, Geoffrey C.; Kolessar, Jake; Butler, Thomas S.; Fox, Judy (October 2023, Frontiers in High Performance Computing)

MLCommons is an effort to develop and improve the artificial intelligence (AI) ecosystem through benchmarks, public data sets, and research. It consists of members from start-ups, leading companies, academics, and non-profits from around the world. The goal is to make machine learning better for everyone. In order to increase participation by others, educational institutions provide valuable opportunities for engagement. In this article, we identify numerous insights obtained from different viewpoints as part of efforts to utilize high-performance computing (HPC) big data systems in existing education while developing and conducting science benchmarks for earthquake prediction. As this activity was conducted across multiple educational efforts, we project if and how it is possible to make such efforts available on a wider scale. This includes the integration of sophisticated benchmarks into courses and research activities at universities, exposing the students and researchers to topics that are otherwise typically not sufficiently covered in current course curricula as we witnessed from our practical experience across multiple organizations. As such, we have outlined the many lessons we learned throughout these efforts, culminating in the need forbenchmark carpentryfor scientists using advanced computational resources. The article also presents the analysis of an earthquake prediction code benchmark while focusing on the accuracy of the results and not only on the runtime; notedly, this benchmark was created as a result of our lessons learned. Energy traces were produced throughout these benchmarks, which are vital to analyzing the power expenditure within HPC environments. Additionally, one of the insights is that in the short time of the project with limited student availability, the activity was only possible by utilizing a benchmark runtime pipeline while developing and using software to generate jobs from the permutation of hyperparameters automatically. It integrates a templated job management framework for executing tasks and experiments based on hyperparameters while leveraging hybrid compute resources available at different institutions. The software is part of a collection calledcloudmeshwith its newly developed components, cloudmesh-ee (experiment executor) and cloudmesh-cc (compute coordinator).
more » « less
Full Text Available
Templated Hybrid Reusable Computational Analytics Workflow Management with Cloudmesh, Applied to the Deep Learning MLCommons Cloudmask Application

https://doi.org/10.1109/e-Science58273.2023.10254942

von Laszewski, Gregor; Fleischer, J.P.; Fox, Geoffrey C.; Papay, Juri; Jackson, Sam; Thiyagalingam, Jeyan (October 2023, IEEE)
Deep Learning Patterns Enabling AI for Science

https://doi.org/10.1109/JVA60410.2023.00014

Fox, Geoffrey (July 2023, IEEE)
Forecasting tsunami inundation with convolutional neural networks for a potential Cascadia Subduction Zone rupture

https://doi.org/10.22541/essoar.167591125.50103833/v1

Grzan, David; Rundle, John B; Fox, Geoffry C; Donnellan, Andrea (February 2023, Authorea, Inc.)

Tsunamis in the last two decades have resulted in the loss of life of over 200,000 people and have caused billions of dollars in damage. There is therefore great motivation for the development and improvement of current tsunami warning systems. The work presented here represents advancements made towards the creation of a neural network-based tsunami warning system which can produce fast inundation forecasts with high accuracy. This was done by first improving the waveform resolution and accuracy of Tsunami Squares, an efficient cellular automata approach to wave simulation. It was then used to create a database of precomputed tsunamis in the event of a magnitude 9+ rupture of the Cascadia Subduction Zone, located only ∼100 km off the coast of Oregon, US. Our approach utilized a convolutional neural network which took wave height data from buoys as input and proved successful as maps of maximum inundation could be predicted for the town of Seaside, OR with a median error of ∼0.5 m.
more » « less
Full Text Available
AI Benchmarking for Science: Efforts from the MLCommons Science Working Group

https://doi.org/10.1007/978-3-031-23220-6_4

Jeyan Thiyagalingam, Gregor von (January 2023, 2022 ISC WORKSHOP: HPC ON HETEROGENEOUS HARDWARE (H3))

Full Text Available
Does the Catalog of California Earthquakes, with Aftershocks Included, Contain Information about Future Large Earthquakes?

John B. Rundle; Andrea Donnellan; Geoffrey Fox; Lisa Grant Ludwig; James P Crutchfield (August 2022, Earth and Space Science Open Archive)

Yes
more » « less
Full Text Available
Earthquake Nowcasting with Deep Learning

https://doi.org/10.3390/geohazards3020011

Fox, Geoffrey Charles; Rundle, John B.; Donnellan, Andrea; Feng, Bo (June 2022, GeoHazards)

We review previous approaches to nowcasting earthquakes and introduce new approaches based on deep learning using three distinct models based on recurrent neural networks and transformers. We discuss different choices for observables and measures presenting promising initial results for a region of Southern California from 1950–2020. Earthquake activity is predicted as a function of 0.1-degree spatial bins for time periods varying from two weeks to four years. The overall quality is measured by the Nash Sutcliffe efficiency comparing the deviation of nowcast and observation with the variance over time in each spatial region. The software is available as open source together with the preprocessed data from the USGS.
more » « less
Full Text Available
Scientific machine learning benchmarks

https://doi.org/10.1038/s42254-022-00441-7

Thiyagalingam, Jeyan; Shankar, Mallikarjun; Fox, Geoffrey; Hey, Tony (June 2022, Nature Reviews Physics)

Full Text Available
MLPerf™ HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems

https://doi.org/10.1109/MLHPC54614.2021.00009

Farrell, Steven; Emani, Murali; Balma, Jacob; Drescher, Lukas; Drozd, Aleksandr; Fink, Andreas; Fox, Geoffrey; Kanter, David; Kurth, Thorsten; Mattson, Peter; et al (November 2021, 2021 IEEE/ACM Workshop on Machine Learning in High Performance Computing Environments (MLHPC))

Scientific communities are increasingly adopting machine learning and deep learning models in their applications to accelerate scientific insights. High performance computing systems are pushing the frontiers of performance with a rich diversity of hardware resources and massive scale-out capabilities. There is a critical need to understand fair and effective benchmarking of machine learning applications that are representative of real-world scientific use cases. MLPerf ™ is a community-driven standard to benchmark machine learning workloads, focusing on end-to-end performance metrics. In this paper, we introduce MLPerf HPC, a benchmark suite of large-scale scientific machine learning training applications, driven by the MLCommons ™ Association. We present the results from the first submission round including a diverse set of some of the world’s largest HPC systems. We develop a systematic framework for their joint analysis and compare them in terms of data staging, algorithmic convergence and compute performance. As a result, we gain a quantitative understanding of optimizations on different subsystems such as staging and on-node loading of data, compute-unit utilization and communication scheduling enabling overall >10× (end-to-end) performance improvements through system scaling. Notably, our analysis shows a scale-dependent interplay between the dataset size, a system’s memory hierarchy and training convergence that underlines the importance of near-compute storage. To overcome the data-parallel scalability challenge at large batch-sizes, we discuss specific learning techniques and hybrid data-and-model parallelism that are effective on large systems. We conclude by characterizing each benchmark with respect to low-level memory, I/O and network behaviour to parameterize extended roofline performance models in future rounds.
more » « less
Full Text Available

Search for: All records