NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Locking Down Science Gateways

https://doi.org/10.5281/zenodo.13868761

Brandt, Steven R; Diehl, Patrick (October 2024, Zenodo)

The most recent Linux kernels have a new feature for securing applications: Landlock. Like Seccomp before it, Landlock makes it possible for a running process to give up access to resources. For applications running as Science Gateways, we want to have network access while starting up MPI, but we want to take away network access prior to the reading of parameter files in order to prevent malicious exploits of the gateway code. We explore the usefulness of this tool by modifying and locking down two mature scientific codes: The Einstein Toolkit, and Octo- Tiger.
more » « less
Full Text Available
Octo-Tiger’s New Hydro Module and Performance Using HPX+CUDA on ORNL’s Summit

https://doi.org/10.1109/Cluster48925.2021.00059

Diehl, Patrick; Dais, Gregor; Marcello, Dominic; Huck, Kevin; Shiber, Sagiv; Kaiser, Hartmut; Frank, Juhan; Clayton, Geoffrey C.; Pfluger, Dirk (September 2021, 2021 IEEE International Conference on Cluster Computing (CLUSTER))

Octo-Tiger is a code for modeling three-dimensional self-gravitating astrophysical fluids. It was particularly designed for the study of dynamical mass transfer between interacting binary stars. Octo-Tiger is parallelized for distributed systems using the asynchronous many-task runtime system, the C++ standard library for parallelism and concurrency (HPX) and utilizes CUDA for its gravity solver. Recently, we have remodeled Octo-Tiger’s hydro solver to use a three-dimensional reconstruction scheme. In addition, we have ported the hydro solver to GPU using CUDA kernels. We present scaling results for the new hydro kernels on ORNL’s Summit machine using a Sedov-Taylor blast wave problem. We also compare Octo-Tiger’s new hydro scheme with its old hydro scheme, using a rotating star as a test problem.
more » « less
Full Text Available
octo-tiger : a new, 3D hydrodynamic code for stellar mergers that uses hpx parallelization

https://doi.org/10.1093/mnras/stab937

Marcello, Dominic C; Shiber, Sagiv; De Marco, Orsola; Frank, Juhan; Clayton, Geoffrey C; Motl, Patrick M; Diehl, Patrick; Kaiser, Hartmut (May 2021, Monthly Notices of the Royal Astronomical Society)
null (Ed.)
ABSTRACT octo-tiger is an astrophysics code to simulate the evolution of self-gravitating and rotating systems of arbitrary geometry based on the fast multipole method, using adaptive mesh refinement. octo-tiger is currently optimized to simulate the merger of well-resolved stars that can be approximated by barotropic structures, such as white dwarfs (WDs) or main-sequence stars. The gravity solver conserves angular momentum to machine precision, thanks to a ‘correction’ algorithm. This code uses hpx parallelization, allowing the overlap of work and communication and leading to excellent scaling properties, allowing for the computation of large problems in reasonable wall-clock times. In this paper, we investigate the code performance and precision by running benchmarking tests. These include simple problems, such as the Sod shock tube, as well as sophisticated, full, WD binary simulations. Results are compared to analytical solutions, when known, and to other grid-based codes such as flash. We also compute the interaction between two WDs from the early mass transfer through to the merger and compare with past simulations of similar systems. We measure octo-tiger’s scaling properties up to a core count of ∼80 000, showing excellent performance for large problems. Finally, we outline the current and planned areas of development aimed at tackling a number of physical phenomena connected to observations of transients.
more » « less
Full Text Available
Integration of CUDA Processing within the C++ Library for Parallelism and Concurrency (HPX)

https://doi.org/10.1109/ESPM2.2018.00006

Diehl, Patrick; Seshadri, Madhavan; Heller, Thomas; Kaiser, Hartmut (November 2018, 2018 IEEE/ACM 4th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2))

Experience shows that on today's high performance systems the utilization of different acceleration cards in conjunction with a high utilization of all other parts of the system is difficult. Future architectures, like exascale clusters, are expected to aggravate this issue as the number of cores are expected to increase and memory hierarchies are expected to become deeper. One big aspect for distributed applications is to guarantee high utilization of all available resources, including local or remote acceleration cards on a cluster while fully using all the available CPU resources and the integration of the GPU work into the overall programming model. For the integration of CUDA code we extended HPX, a general purpose C++ run time system for parallel and distributed applications of any scale, and enabled asynchronous data transfers from and to the GPU device and the asynchronous invocation of CUDA kernels on this data. Both operations are well integrated into the general programming model of HPX which allows to seamlessly overlap any GPU operation with work on the main cores. Any user defined CUDA kernel can be launched on any (local or remote) GPU device available to the distributed application. We present asynchronous implementations for the data transfers and kernel launches for CUDA code as part of a HPX asynchronous execution graph. Using this approach we can combine all remotely and locally available acceleration cards on a cluster to utilize its full performance capabilities. Overhead measurements show, that the integration of the asynchronous operations (data transfer + launches of the kernels) as part of the HPX execution graph imposes no additional computational overhead and significantly eases orchestrating coordinated and concurrent work on the main cores and the used GPU devices.
more » « less
Full Text Available
An Introduction to hpxMP: A Modern OpenMP Implementation Leveraging HPX, An Asynchronous Many-Task System

https://doi.org/10.1145/3318170.3318191

Zhang, Tianyi; Shirzad, Shahrzad; Diehl, Patrick; Tohid, R.; Wei, Weile; Kaiser, Hartmut (January 2019, IWOCL'19 Proceedings of the International Workshop on OpenCL)

Asynchronous Many-task (AMT) runtime systems have gained increasing acceptance in the HPC community due to the performance improvements offered by fine-grained tasking runtime systems. At the same time, C++ standardization efforts are focused on creating higher-level interfaces able to replace OpenMP or OpenACC in modern C++ codes. These higher level functions have been adopted in standards conforming runtime systems such as HPX, giving users the ability to simply utilize fork-join parallelism in their own codes. Despite innovations in runtime systems and standardization efforts users face enormous challenges porting legacy applications. Not only must users port their own codes, but often users rely on highly optimized libraries such as BLAS and LAPACK which use OpenMP for parallization. Current efforts to create smooth migration paths have struggled with these challenges, especially as the threading systems of AMT libraries often compete with the treading system of OpenMP. To overcome these issues, our team has developed hpxMP, an implementation of the OpenMP standard, which utilizes the underlying AMT system to schedule and manage tasks. This approach leverages the C++ interfaces exposed by HPX and allows users to execute their applications on an AMT system without changing their code. In this work, we compare hpxMP with Clang's OpenMP library with four linear algebra benchmarks of the Blaze C++ library. While hpxMP is often not able to reach the same performance, we demonstrate viability for providing a smooth migration for applications but have to be extended to benefit from a more general task based programming model.
more » « less
Full Text Available
Asynchronous Execution of Python Code on Task-Based Runtime Systems

https://doi.org/10.1109/ESPM2.2018.00009

Tohid, R.; Wagle, Bibek; Shirzad, Shahrzad; Diehl, Patrick; Serio, Adrian; Kheirkhahan, Alireza; Amini, Parsa; Williams, Katy; Isaacs, Kate; Huck, Kevin; et al (November 2018, 2018 IEEE/ACM 4th International Workshop on Extreme Scale Programming Models and Middleware (ESPM2))

Despite advancements in the areas of parallel and distributed computing, the complexity of programming on High Performance Computing (HPC) resources has deterred many domain experts, especially in the areas of machine learning and artificial intelligence (AI), from utilizing performance benefits of such systems. Researchers and scientists favor high-productivity languages to avoid the inconvenience of programming in low-level languages and costs of acquiring the necessary skills required for programming at this level. In recent years, Python, with the support of linear algebra libraries like NumPy, has gained popularity despite facing limitations which prevent this code from distributed runs. Here we present a solution which maintains both high level programming abstractions as well as parallel and distributed efficiency. Phylanx, is an asynchronous array processing toolkit which transforms Python and NumPy operations into code which can be executed in parallel on HPC resources by mapping Python and NumPy functions and variables into a dependency tree executed by HPX, a general purpose, parallel, task-based runtime system written in C++. Phylanx additionally provides introspection and visualization capabilities for debugging and performance analysis. We have tested the foundations of our approach by comparing our implementation of widely used machine learning algorithms to accepted NumPy standards.
more » « less
Full Text Available

Search for: All records