NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Parallel Seismic Data Processing Performance with Cloud-Based Storage

https://doi.org/10.1785/0220250115

Mohapatra, Sasmita; Yang, Weiming; Yang, Zhengtang; Wang, Chenxiao; Ma, Jinxin; Pavlis, Gary L; Wang, Yinzhi (August 2025, Seismological Research Letters)

Abstract This article introduces a general processing framework to effectively utilize waveform data stored on modern cloud platforms. The focus is hybrid processing schemes for which a local system drives processing. We show that downloading files and doing all processing locally is problematic even when the local system is a high-performance computing (HPC) cluster. Benchmark tests with parallel processing show that approach always creates a bottleneck as the volume of data being handled increases with more processes pulling data. We find a hybrid model for which processing to reduce the volume of data transferred from the cloud servers to the local system can dramatically improve processing time. Tests implemented with the Massively Parallel Analysis System for Seismology (MsPASS) utilizing Amazon Web Service’s (AWS) Lambda service yield throughput comparable to processing day files on a local HPC file system. Given the ongoing migration of seismology data to cloud storage, our results show doing some or all processing on the cloud will be essential for any processing involving large volumes of data.
more » « less
Free, publicly-accessible full text available August 20, 2026
Tapis v3 Streams API: Time‐series and data‐driven event support in science gateway infrastructure

https://doi.org/10.1002/cpe.6103

Cleveland, Sean B.; Jamthe, Anagha; Padhy, Smruti; Stubbs, Joe; Terry, Steven; Looney, Julia; Cardone, Richard; Packard, Michael; Dahan, Maytal; Jacobs, Gwen A. (November 2020, Concurrency and Computation: Practice and Experience)

Summary The explosion of IoT devices and sensors in recent years has led to a demand for efficiently storing, processing and analyzing time‐series data. Geoscience researchers use time‐series data stores such as Hydroserver, Virtual Observatory and Ecological Informatics System (VOEIS), and Cloud‐Hosted Real‐time Data Service (CHORDS). Many of these tools require a great deal of infrastructure to deploy and expertise to manage and scale. The Tapis framework, an NSF funded project, provides science as a service APIs to allow researchers to achieve faster scientific results, by eliminating the need to set up a complex infrastructure stack. The University of Hawai'i (UH) and Texas Advanced Computing Center (TACC) have collaborated to develop an open source Tapis Streams API that builds on the concepts of the CHORDS time series data service to support research. This new hosted service allows storing, processing, annotating, archiving, and querying time‐series data in the Tapis multi‐user and multi‐tenant collaborative platform. The Streams API provides a hosted production level middleware service that enables new data‐driven event workflows capabilities that may be leveraged by researchers and Tapis powered science gateways for handling spatially indexed time‐series datasets.
more » « less
ML Field Planner: Analyzing and Optimizing ML Pipelines For Field Research

https://doi.org/10.1145/3708035.3736013

Stubbs, Joe; Balasubramaniam, Sowbaranika; Khuvis, Samuel; Withana, Sachith; Vallabhajosyula, Manikya Swathi; Cardone, Richard; Garcia, Christian; Freeman, Nathan; Guzman, Carlos; Plale, Beth; et al (July 2025, ACM)

Free, publicly-accessible full text available July 18, 2026
TRAIL: Audit Trails for Enhanced Reproducibility and Observability of Research Computing

https://doi.org/10.5281/zenodo.17246911

Rosenberg, Jake; Cardone, Richard; Curbelo, Gilbert; Vanessa, Gonzalez; Black, Steven; Tijerina, Sal; Rivera-Sanchez, Erik; Dahan, Maytal; Stanzione, Dan (January 2025, Zenodo)

As research projects grow more complex and researchers use a mix of tools - command-line scripts, science gateways, and Jupyter notebooks - it becomes increasingly difficult to track exactly how a final result was produced. Each tool often keeps its own logs, making it hard to reconstruct the full sequence of computational steps. This lack of end-to-end visibility poses a serious challenge for scientific reproducibility. Yet advanced computing remains a critical part of nearly every field of academic research, and researchers continue to rely on a wide range of interfaces to run their scientific software. To address this challenge, the Advanced Computing Interfaces group at the Texas Advanced Computing Center (TACC) created a system that collates logs from multiple sources - science gateways, Jupyter notebooks, and the Tapis platform - into one unified “audit trail.” The TACC Research Audit and Integration of Logs (TRAIL) system allows researchers and staff to follow the complete path a dataset or file took: from the moment it was first uploaded to TACC, through every step of computation, to the final result. This kind of tracking helps ensure scientific results can be reproduced and gives advanced computing services better insight into how data and resources are being used.
more » « less
Full Text Available
Toward Smart Scheduling in Tapis

https://doi.org/10.1109/BigData62323.2024.10826096

Stubbs, Joe; Padhy, Smruti; Cardone, Richard (December 2024, IEEE)

Full Text Available
Tapis Machine Learning Hub Service for Science Gateways

https://doi.org/10.5281/zenodo.13863818

Indrakusuma, Dhanny; Stubbs, Joe; Freeman, Nathan; Jamthe, Anagha (October 2024, Zenodo)

The adaptation of machine learning (ML) in scientific and medical research in recent years has heralded a new era of innovation, catalyzing breakthroughs that were once deemed unattainable. In this paper, we present the Machine Learning Hub (ML Hub) – a web application offering a single point of access to pre-trained ML models and datasets, catering to users across varying expertise levels. Built upon the NSF-funded Tapis v3 Application Programming Interface (API) and Tapis User Interface (TapisUI), the platform offers a user-friendly interface for model discovery, dataset exploration, and inference server deployment.
more » « less
Full Text Available
A Comprehensive Cloud Architecture for Machine Learning-enabled Research

https://doi.org/10.1145/3626203.3670525

Stubbs, Joe; Indrakusuma, Dhanny; Garcia, Christian; Halbach, François; Hammock, Cody; Freeman, Nathan; Jamthe, Anagha; Packard, Michael; Fields, Alexander; Curbelo, Gilbert (July 2024, ACM)

Full Text Available
AI and Science Gateways: A Promising Combination for Accelerating Science and Research Computing

https://doi.org/10.1145/3626203.3670562

Sperhac, Jeanette; Gesing, Sandra; Zentner, Michael; Stirm, Claire; Quick, Rob; Stubbs, Joe (July 2024, ACM)

Full Text Available
CloudSec: An Extensible Automated Reasoning Framework for Cloud Security Policies

https://doi.org/10.1007/978-3-031-56950-0_23

Stubbs, Joe; Padhy, Smruti; Cardone, Richard; Black, Steve (January 2024, Springer Nature Switzerland)

Full Text Available
Tapis Machine Learning Hub Service for Science Gateways

https://doi.org/10.5281/zenodo.13863819

Indrakusuma, Dhanny; Stubbs, Joe; Freeman, Nathan; Jamthe, Anagha (January 2024, Zenodo)

The adaptation of machine learning (ML) in scientific and medical research in recent years has heralded a new era of innovation, catalyzing breakthroughs that were once deemed unattainable. In this paper, we present the Machine Learning Hub (ML Hub) – a web application offering a single point of access to pre-trained ML models and datasets, catering to users across varying expertise levels. Built upon the NSF-funded Tapis v3 Application Programming Interface (API) and Tapis User Interface (TapisUI), the platform offers a user-friendly interface for model discovery, dataset exploration, and inference server deployment.
more » « less
Full Text Available

« Prev Next »

Search for: All records