Stimulus: Accelerate Data Management for Scientific AI applications in HPC

Devarajan, Hariharan; Kougkas, Anthony; Zheng, Huihuo; Vishwanath, Venkatram; Sun, Xian-He

doi:10.1109/CCGrid54584.2022.00020

Citation Details

Stimulus: Accelerate Data Management for Scientific AI applications in HPC

Modern scientific workflows couple simulations with AI-powered analytics by frequently exchanging data to accelerate time-to-science to reduce the complexity of the simulation planes. However, this data exchange is limited in performance and portability due to a lack of support for scientific data formats in AI frameworks. We need a cohesive mechanism to effectively integrate at scale complex scientific data formats such as HDF5, PnetCDF, ADIOS2, GNCF, and Silo into popular AI frameworks such as TensorFlow, PyTorch, and Caffe. To this end, we designed Stimulus, a data management library for ingesting scientific data effectively into the popular AI frameworks. We utilize the StimOps functions along with StimPack abstraction to enable the integration of scientific data formats with any AI framework. The evaluations show that Stimulus outperforms several large-scale applications with different use-cases such as Cosmic Tagger (consuming HDF5 dataset in PyTorch), Distributed FFN (consuming HDF5 dataset in TensorFlow), and CosmoFlow (converting HDF5 into TFRecord and then consuming that in TensorFlow) by 5.3 x, 2.9 x, and 1.9 x respectively with ideal I/O scalability up to 768 GPUs on the Summit supercomputer. Through Stimulus, we can portably extend existing popular AI frameworks to cohesively support any complex scientific data format and efficiently scale the applications on large-scale supercomputers. more »

Award ID(s):: 1835764 1814872

PAR ID:: 10379201

Author(s) / Creator(s):: Devarajan, Hariharan; Kougkas, Anthony; Zheng, Huihuo; Vishwanath, Venkatram; Sun, Xian-He

Date Published:: 2022-05-01

Journal Name:: 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid)

Page Range / eLocation ID:: 109 to 118

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/CCGrid54584.2022.00020

More Like this