Title: openDIEL: A Parallel Workflow Engine and Data Analytics Framework
openDIEL is a workflow engine that aims to give researchers and users of HPC an efficient way to coordinate, organize, and interconnect many disparate modules of computation in order to effectively utilize and allocate HPC resources [13]. A GUI has been developed to aid in creating workflows, and allows for the specification of data science jobs, including specification neural network architectures, data processing, and hyperparameter tuning. Existing machine learning tools can be readily used in the openDIEL, allowing for easy experimentation with various models and approaches. more »« less
Benavides, J.; Baugh, J.; Gopalakrishnan, G.
(, Lecture notes in computer science)
Mendis, Charith; Rauchwerger, Lawrence
(Ed.)
HPC practitioners make use of techniques, such as parallelism and sparse data structures, that are difficult to reason about and debug. Here we explore the role of data refinement, a correct-by-construction approach, in verifying HPC applications via bounded model checking. We show how single program, multiple data (SPMD) parallelism can be modeled in Alloy, a declarative specification language, and describe common issues that arise when performing scope-complete refinement checks in this context.
Raj, Rajendra K.; Romanowski, Carol J.; Aly, Sherif G.; Becker, Brett A.; Chen, Juan; Ghafoor, Sheikh; Giacaman, Nasser; Gordon, Steven I.; Izu, Cruz; Rahimi, Shahram; et al
(, Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education)
High Performance Computing (HPC) is the ability to process data and perform complex calculations at extremely high speeds. Current HPC platforms can achieve calculations on the order of quadrillions of calculations per second with quintillions on the horizon. The past three decades witnessed a vast increase in the use of HPC across different scientific, engineering and business communities, for example, sequencing the genome, predicting climate changes, designing modern aerodynamics, or establishing customer preferences. Although HPC has been well incorporated into science curricula such as bioinformatics, the same cannot be said for most computing programs. This working group will explore how HPC can make inroads into computer science education, from the undergraduate to postgraduate levels. The group will address research questions designed to investigate topics such as identifying and handling barriers that inhibit the adoption of HPC in educational environments, how to incorporate HPC into various curricula, and how HPC can be leveraged to enhance applied critical thinking and problem-solving skills. Four deliverables include: (1) a catalog of core HPC educational concepts, (2) HPC curricula for contemporary computing needs, such as in artificial intelligence, cyberanalytics, data science and engineering, or internet of things, (3) possible infrastructures for implementing HPC coursework, and (4) HPC-related feedback to the CC2020 project.
Full waveform (FW) LiDAR holds great potential for retrieving vegetation structure parameters at a high level of detail, but this prospect is constrained by practical factors such as the lack of available handy processing tools and the technical intricacy of waveform processing. This study introduces a new product named the Hyper Point Cloud (HPC), derived from FW LiDAR data, and explores its potential applications, such as tree crown delineation using the HPC-based intensity and percentile height (PH) surfaces, which shows promise as a solution to the constraints of using FW LiDAR data. The results of the HPC present a new direction for handling FW LiDAR data and offer prospects for studying the mid-story and understory of vegetation with high point density (~182 points/m2). The intensity-derived digital surface model (DSM) generated from the HPC shows that the ground region has higher maximum intensity (MAXI) and mean intensity (MI) than the vegetation region, while having lower total intensity (TI) and number of intensities (NI) at a given grid cell. Our analysis of intensity distribution contours at the individual tree level exhibit similar patterns, indicating that the MAXI and MI decrease from the tree crown center to the tree boundary, while a rising trend is observed for TI and NI. These intensity variable contours provide a theoretical justification for using HPC-based intensity surfaces to segment tree crowns and exploit their potential for extracting tree attributes. The HPC-based intensity surfaces and the HPC-based PH Canopy Height Models (CHM) demonstrate promising tree segmentation results comparable to the LiDAR-derived CHM for estimating tree attributes such as tree locations, crown widths and tree heights. We envision that products such as the HPC and the HPC-based intensity and height surfaces introduced in this study can open new perspectives for the use of FW LiDAR data and alleviate the technical barrier of exploring FW LiDAR data for detailed vegetation structure characterization.
Liu, Tong; Alibhai, Shakeel; Wang, Jinzhen; Liu, Qing; He, Xubin; Wu, Chentao
(, 2019 IEEE International Conference on Networking, Architecture and Storage (NAS))
Nowadays, scientific simulations on high-performance computing (HPC) systems can generate large amounts of data (in the scale of terabytes or petabytes) per run. When this huge amount of HPC data is processed by machine learning applications, the training overhead will be significant. Typically, the training process for a neural network can take several hours to complete, if not longer. When machine learning is applied to HPC scientific data, the training time can take several days or even weeks. Transfer learning, an optimization usually used to save training time or achieve better performance, has potential for reducing this large training overhead. In this paper, we apply transfer learning to a machine learning HPC application. We find that transfer learning can reduce training time without, in most cases, significantly increasing the error. This indicates transfer learning can be very useful for working with HPC datasets in machine learning applications.
Xu, Li; Hong, Yili; Morris, Max_D; Cameron, Kirk_W
(, Journal of the Royal Statistical Society Series C: Applied Statistics)
Abstract Although high-performance computing (HPC) systems have been scaled to meet the exponentially growing demand for scientific computing, HPC performance variability remains a major challenge in computer science. Statistically, performance variability can be characterized by a distribution. Predicting performance variability is a critical step in HPC performance variability management. In this article, we propose a new framework to predict performance distributions. The proposed framework is a modified Gaussian process that can predict the distribution function of the input/output (I/O) throughput under a specific HPC system configuration. We also impose a monotonic constraint so that the predicted function is nondecreasing, which is a property of the cumulative distribution function. Additionally, the proposed model can incorporate both quantitative and qualitative input variables. We predict the HPC I/O distribution using the proposed method for the IOzone variability data. Data analysis results show that our framework can generate accurate predictions, and outperform existing methods. We also show how the predicted functional output can be used to generate predictions for a scalar summary of the performance distribution, such as the mean, standard deviation, and quantiles. Our prediction results can further be used for HPC system variability monitoring and optimization. This article has online supplementary materials.
@article{osti_10143151,
place = {Country unknown/Code not available},
title = {openDIEL: A Parallel Workflow Engine and Data Analytics Framework},
url = {https://par.nsf.gov/biblio/10143151},
DOI = {10.1145/3332186.3333051},
abstractNote = {openDIEL is a workflow engine that aims to give researchers and users of HPC an efficient way to coordinate, organize, and interconnect many disparate modules of computation in order to effectively utilize and allocate HPC resources [13]. A GUI has been developed to aid in creating workflows, and allows for the specification of data science jobs, including specification neural network architectures, data processing, and hyperparameter tuning. Existing machine learning tools can be readily used in the openDIEL, allowing for easy experimentation with various models and approaches.},
journal = {PEARC19},
author = {Betancourt, Frank and Wong, Kwai and Asemota, Efosa and Marshall, Quindell and Nichols, Daniel and Tomov, Stanimire},
}
Warning: Leaving National Science Foundation Website
You are now leaving the National Science Foundation website to go to a non-government website.
Website:
NSF takes no responsibility for and exercises no control over the views expressed or the accuracy of
the information contained on this site. Also be aware that NSF's privacy policy does not apply to this site.