Exploring Transfer Learning to Reduce Training Overhead of HPC Data in Machine Learning

Liu, Tong; Alibhai, Shakeel; Wang, Jinzhen; Liu, Qing; He, Xubin; Wu, Chentao

doi:10.1109/NAS.2019.8834723

Citation Details

Exploring Transfer Learning to Reduce Training Overhead of HPC Data in Machine Learning

Nowadays, scientific simulations on high-performance computing (HPC) systems can generate large amounts of data (in the scale of terabytes or petabytes) per run. When this huge amount of HPC data is processed by machine learning applications, the training overhead will be significant. Typically, the training process for a neural network can take several hours to complete, if not longer. When machine learning is applied to HPC scientific data, the training time can take several days or even weeks. Transfer learning, an optimization usually used to save training time or achieve better performance, has potential for reducing this large training overhead. In this paper, we apply transfer learning to a machine learning HPC application. We find that transfer learning can reduce training time without, in most cases, significantly increasing the error. This indicates transfer learning can be very useful for working with HPC datasets in machine learning applications. more »

Award ID(s):: 1813081

PAR ID:: 10176033

Author(s) / Creator(s):: Liu, Tong; Alibhai, Shakeel; Wang, Jinzhen; Liu, Qing; He, Xubin; Wu, Chentao

Date Published:: 2019-09-12

Journal Name:: 2019 IEEE International Conference on Networking, Architecture and Storage (NAS)

Page Range / eLocation ID:: 1 to 7

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1109/NAS.2019.8834723

More Like this