skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 5, 2026

Title: Training the Next Generation of Seismologists: Delivering Research-Grade Software Education for Cloud and HPC Computing Through Diverse Training Modalities
Abstract With the rise of data volume and computing power, seismological research requires more advanced skills in data processing, numerical methods, and parallel computing. We present the experience of conducting training workshops in various forms of delivery to support the adoption of large-scale high-performance computing (HPC) and cloud computing, advancing seismological research. The seismological foci were on earthquake source parameter estimation in catalogs, forward and adjoint wavefield simulations in 2D and 3D at local, regional, and global scales, earthquake dynamics, ambient noise seismology, and machine learning. This contribution describes the series of workshops delivered as part of research projects, the learning outcomes for participants, and lessons learned by the instructors. Our curriculum was grounded on open and reproducible science, large-scale scientific computing and data mining, and computing infrastructure (access and usage) for HPC and the cloud. We also describe the types of teaching materials that have proven beneficial to the instruction and the sustainability of the program. We propose guidelines to deliver future workshops on these topics.  more » « less
Award ID(s):
2225216 2103621 2103701 2121568 2311206 2311208 2104052
PAR ID:
10610298
Author(s) / Creator(s):
; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; more » ; ; ; ; ; ; ; ; ; ; ; ; « less
Publisher / Repository:
Seismological Research Letters
Date Published:
Journal Name:
Seismological Research Letters
ISSN:
0895-0695
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. SUMMARY Seismology has entered the petabyte era, driven by decades of continuous recordings of broad-band networks, the increase in nodal seismic experiments and the recent emergence of distributed acoustic sensing (DAS). This review explains how cloud platforms, by providing object storage, elastic compute and managed data bases, enable researchers to ‘bring the code to the data,’ thereby providing a scalable option to overcome traditional HPC solutions’ bandwidth and capacity limitations. After literature reviews of cloud concepts and their research applications in seismology, we illustrate the capacities of cloud-native workflows using two canonical end-to-end demonstrations: (1) ambient noise seismology that calculates cross-correlation functions at scale, and (2) earthquake detection and phase picking. Both workflows utilize Amazon Web Services, a commercial cloud platform for streaming I/O and provenance, demonstrating that cloud throughput can rival on-premises HPC at comparable costs, scanning 100 TBs to 1.3 PBs of seismic data in a few hours or days of processing. The review also discusses research and education initiatives, the reproducibility benefits of containers and cost pitfalls (e.g. egress, I/O fees) of energy-intensive seismological research computing. While designing cloud pipelines remains non-trivial, partnerships with research software engineers enable converting domain code into scalable, automated and environmentally conscious solutions for next-generation seismology. We also outline where cloud resources fall short of specialized HPC—most notably for tightly coupled petascale simulations and long-term, PB-scale archives—so that practitioners can make informed, cost-effective choices. 
    more » « less
  2. The commercial cloud offers on-demand computational resources that could be revolutionary for the seismological community, especially as seismic datasets continue to grow. However, there are few educational examples for cloud use that target individual seismological researchers. Here, we present a reproducible earthquake detection and association workflow that runs on Microsoft Azure. The Python-based workflow runs on continuous time-series data using both template matching and machine learning. We provide tutorials for constructing cloud resources (both storage and computing) through a desktop portal and deploying the code both locally and remotely on the cloud resources. We report on scaling of compute times and costs to show that CPU-only processing is generally inexpensive, and is faster and simpler than using GPUs. When the workflow is applied to one year of continuous data from a mid-ocean ridge, the resulting earthquake catalogs suggest that template matching and machine learning are complementary methods whose relative performance is dependent on site-specific tectonic characteristics. Overall, we find that the commercial cloud presents a steep learning curve but is cost-effective. This report is intended as an informative starting point for any researcher considering migrating their own processing to the commercial cloud. 
    more » « less
  3. Abstract Large-scale processing and dissemination of distributed acoustic sensing (DAS) data are among the greatest computational challenges and opportunities of seismological research today. Current data formats and computing infrastructure are not well-adapted or user-friendly for large-scale processing. We propose an innovative, cloud-native solution for DAS seismology using the MinIO open-source object storage framework. We develop data schema for cloud-optimized data formats—Zarr and TileDB, which we deploy on a local object storage service compatible with the Amazon Web Services (AWS) storage system. We benchmark reading and writing performance for various data schema using canonical use cases in seismology. We test our framework on a local server and AWS. We find much-improved performance in compute time and memory throughout when using TileDB and Zarr compared to the conventional HDF5 data format. We demonstrate the platform with a computing heavy use case in seismology: ambient noise seismology of DAS data. We process one month of data, pairing all 2089 channels within 24 hr using AWS Batch autoscaling. 
    more » « less
  4. As the volume and sophistication of cyber-attacks grow, cybersecurity researchers, engineers and practitioners rely on advanced cyberinfrastructure (CI) techniques like big data and machine learning, as well as advanced CI platforms, e.g., cloud and high-performance computing (HPC) to assess cyber risks, identify and mitigate threats, and achieve defense in depth. There is a training gap where current cybersecurity curricula at many universities do not introduce advanced CI techniques to future cybersecurity workforce. At Old Dominion University (ODU), we are bridging this gap through an innovative training program named DeapSECURE (Data-Enabled Advanced Training Program for Cyber Security Research and Education). We developed six non-degree training modules to expose cybersecurity students to advanced CI platforms and techniques rooted in big data, machine learning, neural networks, and high-performance programming. Each workshop includes a lecture providing the motivation and context for a CI technique, which is then examined during a hands-on session. The modules are delivered through (1) monthly workshops for ODU students, and (2) summer institutes for students from other universities and Research Experiences for Undergraduates participants. Future plan for the training program includes an online continuous learning community as an extension to the workshops, and all learning materials available as open educational resources, which will facilitate widespread adoption, adaptations, and contributions. The project leverages existing partnerships to ensure broad participation and adoption of advanced CI techniques in the cybersecurity community. We employ a rigorous evaluation plan rooted in diverse metrics of success to improve the curriculum and demonstrate its effectiveness. 
    more » « less
  5. Nowadays, scientific simulations on high-performance computing (HPC) systems can generate large amounts of data (in the scale of terabytes or petabytes) per run. When this huge amount of HPC data is processed by machine learning applications, the training overhead will be significant. Typically, the training process for a neural network can take several hours to complete, if not longer. When machine learning is applied to HPC scientific data, the training time can take several days or even weeks. Transfer learning, an optimization usually used to save training time or achieve better performance, has potential for reducing this large training overhead. In this paper, we apply transfer learning to a machine learning HPC application. We find that transfer learning can reduce training time without, in most cases, significantly increasing the error. This indicates transfer learning can be very useful for working with HPC datasets in machine learning applications. 
    more » « less