Towards Interactive, Reproducible Analytics at Scale on HPC Systems

Cholia, Shreyas; Heagy, Lindsey; Henderson, Matthew; Paine, Drew; Hays, Jon; Bianchi, Ludovico; Ghoshal, Devarshi; Perez, Fernando; Ramakrishnan, Lavanya

doi:10.1109/UrgentHPC51945.2020.00011

Citation Details

Towards Interactive, Reproducible Analytics at Scale on HPC Systems

The growth in scientific data volumes has resulted in a need to scale up processing and analysis pipelines using High Performance Computing (HPC) systems. These workflows need interactive, reproducible analytics at scale. The Jupyter platform provides core capabilities for interactivity but was not designed for HPC systems. In this paper, we outline our efforts that bring together core technologies based on the Jupyter Platform to create interactive, reproducible analytics at scale on HPC systems. Our work is grounded in a real world science use case - applying geophysical simulations and inversions for imaging the subsurface. Our core platform addresses three key areas of the scientific analysis workflow - reproducibility, scalability, and interactivity. We describe our implemention of a system, using Binder, Science Capsule, and Dask software. We demonstrate the use of this software to run our use case and interactively visualize real-time streams of HDF5 data. more »

Award ID(s):: 1928406

PAR ID:: 10286896

Author(s) / Creator(s):: Cholia, Shreyas; Heagy, Lindsey; Henderson, Matthew; Paine, Drew; Hays, Jon; Bianchi, Ludovico; Ghoshal, Devarshi; Perez, Fernando; Ramakrishnan, Lavanya

Date Published:: 2020-11-01

Journal Name:: 2020 IEEE/ACM HPC for Urgent Decision Making (UrgentHPC)

Page Range / eLocation ID:: 47 to 54

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/UrgentHPC51945.2020.00011

More Like this