skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on July 8, 2026

Title: A Global-scale Database of Seismic Phases from Cloud-based Picking at Petabyte Scale
We present the first global-scale database of 4.3 billion P- and S-wave picks extracted from 1.3 PB continuous seismic data via a cloud-native workflow. Using cloud computing services on Amazon Web Services, we launched ~145,000 containerized jobs on continuous records from 47,354 stations spanning 2002-2025, completing in under three days. Phase arrivals were identified with a deep learning model, PhaseNet, through an open-source Python ecosystem for deep learning, SeisBench. To visualize and gain a global understanding of these picks, we present preliminary results about pick time series revealing Omori-law aftershock decay, seasonal variations linked to noise levels, and dense regional coverage that will enhance earthquake catalogs and machine-learning datasets. We provide all picks in a publicly queryable database, providing a powerful resource for researchers studying seismicity around the world. This report provides insights into the database and the underlying workflow, demonstrating the feasibility of petabyte-scale seismic data mining on the cloud and of providing intelligent data products to the community in an automated manner.  more » « less
Award ID(s):
2103701
PAR ID:
10637076
Author(s) / Creator(s):
; ; ; ; ; ; ; ;
Publisher / Repository:
Seismica - McGill
Date Published:
Journal Name:
Seismica
Volume:
4
Issue:
2
ISSN:
2816-9387
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract Recent progresses in artificial intelligence and machine learning make it possible to automatically identify seismic phases from exponentially growing seismic data. Despite some exciting successes in automatic picking of the first P‐ and S‐wave arrivals, auto‐identification of later seismic phases such as the Moho‐reflected PmP waves remains a significant challenge in matching the performance of experienced analysts. The main difficulty of machine‐identifying PmP waves is that the identifiable PmP waves are rare, making the problem of identifying the PmP waves from a massive seismic database inherently unbalanced. In this work, by utilizing a high‐quality PmP data set (10,192 manual picks) in southern California, we develop PmPNet, a deep‐neural‐network‐based algorithm to automatically identify PmP waves efficiently; by doing so, we accelerate the process of identifying the PmP waves. PmPNet applies similar techniques in the machine learning community to address the unbalancement of PmP datasets. The architecture of PmPNet is a residual neural network (ResNet)‐autoencoder with additional predictor block, where encoder, decoder, and predictor are equipped with ResNet connection. We conduct systematic research with field data, concluding that PmPNet can efficiently achieve high precision and high recall simultaneously to automatically identify PmP waves from a massive seismic database. Applying the pre‐trained PmPNet to the seismic database from January 1990 to December 1999 in southern California, we obtain nearly twice more PmP picks than the original PmP data set, providing valuable data for other studies such as mapping the topography of the Moho discontinuity and imaging the lower crust structures of southern California. 
    more » « less
  2. The commercial cloud offers on-demand computational resources that could be revolutionary for the seismological community, especially as seismic datasets continue to grow. However, there are few educational examples for cloud use that target individual seismological researchers. Here, we present a reproducible earthquake detection and association workflow that runs on Microsoft Azure. The Python-based workflow runs on continuous time-series data using both template matching and machine learning. We provide tutorials for constructing cloud resources (both storage and computing) through a desktop portal and deploying the code both locally and remotely on the cloud resources. We report on scaling of compute times and costs to show that CPU-only processing is generally inexpensive, and is faster and simpler than using GPUs. When the workflow is applied to one year of continuous data from a mid-ocean ridge, the resulting earthquake catalogs suggest that template matching and machine learning are complementary methods whose relative performance is dependent on site-specific tectonic characteristics. Overall, we find that the commercial cloud presents a steep learning curve but is cost-effective. This report is intended as an informative starting point for any researcher considering migrating their own processing to the commercial cloud. 
    more » « less
  3. SUMMARY Seismology has entered the petabyte era, driven by decades of continuous recordings of broad-band networks, the increase in nodal seismic experiments and the recent emergence of distributed acoustic sensing (DAS). This review explains how cloud platforms, by providing object storage, elastic compute and managed data bases, enable researchers to ‘bring the code to the data,’ thereby providing a scalable option to overcome traditional HPC solutions’ bandwidth and capacity limitations. After literature reviews of cloud concepts and their research applications in seismology, we illustrate the capacities of cloud-native workflows using two canonical end-to-end demonstrations: (1) ambient noise seismology that calculates cross-correlation functions at scale, and (2) earthquake detection and phase picking. Both workflows utilize Amazon Web Services, a commercial cloud platform for streaming I/O and provenance, demonstrating that cloud throughput can rival on-premises HPC at comparable costs, scanning 100 TBs to 1.3 PBs of seismic data in a few hours or days of processing. The review also discusses research and education initiatives, the reproducibility benefits of containers and cost pitfalls (e.g. egress, I/O fees) of energy-intensive seismological research computing. While designing cloud pipelines remains non-trivial, partnerships with research software engineers enable converting domain code into scalable, automated and environmentally conscious solutions for next-generation seismology. We also outline where cloud resources fall short of specialized HPC—most notably for tightly coupled petascale simulations and long-term, PB-scale archives—so that practitioners can make informed, cost-effective choices. 
    more » « less
  4. SUMMARY Applications of machine learning in seismology have greatly improved our capability of detecting earthquakes in large seismic data archives. Most of these efforts have been focused on continental shallow earthquakes, but here we introduce an integrated deep-learning-based workflow to detect deep earthquakes recorded by a temporary array of ocean-bottom seismographs (OBSs) and land-based stations in the Tonga subduction zone. We develop a new phase picker, PhaseNet-TF, to detect and pick P- and S-wave arrivals in the time–frequency domain. The frequency-domain information is critical for analysing OBS data, particularly the horizontal components, because they are contaminated by signals of ocean-bottom currents and other noise sources in certain frequency bands. PhaseNet-TF shows a much better performance in picking S waves at OBSs and land stations compared to its predecessor PhaseNet. The predicted phases are associated using an improved Gaussian Mixture Model Associator GaMMA-1D and then relocated with a double-difference package teletomoDD. We further enhance the model performance with a semi-supervised learning approach by iteratively refining labelled data and retraining PhaseNet-TF. This approach effectively suppresses false picks and significantly improves the detection of small earthquakes. The new catalogue of Tonga deep earthquakes contains more than 10 times more events compared to the reference catalogue that was analysed manually. This deep-learning-enhanced catalogue reveals Tonga seismicity in unprecedented detail, and better defines the lateral extent of the double-seismic zone at intermediate depths and the location of four large deep-focus earthquakes relative to background seismicity. It also offers new potential for deciphering deep earthquake mechanisms, refining tomographic models, and understanding of subduction processes. 
    more » « less
  5. null (Ed.)
    Abstract Seismograms are convolution results between seismic sources and the media that seismic waves propagate through, and, therefore, the primary observations for studying seismic source parameters and the Earth interior. The routine earthquake location and travel-time tomography rely on accurate seismic phase picks (e.g., P and S arrivals). As data increase, reliable automated seismic phase-picking methods are needed to analyze data and provide timely earthquake information. However, most traditional autopickers suffer from low signal-to-noise ratio and usually require additional efforts to tune hyperparameters for each case. In this study, we proposed a deep-learning approach that adapted soft attention gates (AGs) and recurrent-residual convolution units (RRCUs) into the backbone U-Net for seismic phase picking. The attention mechanism was implemented to suppress responses from waveforms irrelevant to seismic phases, and the cooperating RRCUs further enhanced temporal connections of seismograms at multiple scales. We used numerous earthquake recordings in Taiwan with diverse focal mechanisms, wide depth, and magnitude distributions, to train and test our model. Setting the picking errors within 0.1 s and predicted probability over 0.5, the AG with recurrent-residual convolution unit (ARRU) phase picker achieved the F1 score of 98.62% for P arrivals and 95.16% for S arrivals, and picking rates were 96.72% for P waves and 90.07% for S waves. The ARRU phase picker also shown a great generalization capability, when handling unseen data. When applied the model trained with Taiwan data to the southern California data, the ARRU phase picker shown no cognitive downgrade. Comparing with manual picks, the arrival times determined by the ARRU phase picker shown a higher consistency, which had been evaluated by a set of repeating earthquakes. The arrival picks with less human error could benefit studies, such as earthquake location and seismic tomography. 
    more » « less