NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Understanding the use of message passing interface in exascale proxy applications

https://doi.org/10.1002/cpe.5901

Sultana, Nawrin; Rüfenacht, Martin; Skjellum, Anthony; Bangalore, Purushotham; Laguna, Ignacio; Mohror, Kathryn (August 2020, Concurrency and Computation: Practice and Experience)

Summary The Exascale Computing Project (ECP) focuses on the development of future exascale‐capable applications. Most ECP applications use the message passing interface (MPI) as their parallel programming model with mini‐apps serving as proxies. This paper explores the explicit usage of MPI in such ECP proxy applications. We empirically analyze 14 proxy applications from the ECP Proxy Apps Suite. We use the MPI profiling interface (PMPI) to collect MPI usage patterns in ECP proxy apps. Our analysis shows that a small subset of features from MPI is commonly used in the proxies of exascale‐capable applications, even when they reference third‐party libraries. This study is intended to provide a better understanding of the use of MPI in current exascale applications. The findings can help focus software investments made for exascale systems in the MPI middleware including optimization, fault‐tolerance, tuning, and hardware‐offload.
more » « less
Reconfigurable switches for high performance and flexible MPI collectives

https://doi.org/10.1002/cpe.6769

Haghi, Pouya; Guo, Anqi; Xiong, Qingqing; Yang, Chen; Geng, Tong; Broaddus, Justin T.; Marshall, Ryan; Schafer, Derek; Skjellum, Anthony; Herbordt, Martin C. (March 2022, Concurrency and Computation: Practice and Experience)

Full Text Available
Implementation and evaluation of MPI 4.0 partitioned communication libraries

https://doi.org/10.1016/j.parco.2021.102827

Dosanjh, Matthew G.F.; Worley, Andrew; Schafer, Derek; Soundararajan, Prema; Ghafoor, Sheikh; Skjellum, Anthony; Bangalore, Purushotham V.; Grant, Ryan E. (December 2021, Parallel Computing)
null (Ed.)
Full Text Available
Design of a Portable Implementation of Partitioned Point-to-Point Communication Primitives

https://doi.org/10.1145/3458744.3474046

Worley, Andrew; Prema Soundararajan, Prema; Schafer, Derek; Bangalore, Purushotham; Grant, Ryan; Dosanjh, Matthew; Skjellum, Anthony; Ghafoor, Sheikh (July 2021, ICPP Workshops '21: 50th International Conference on Parallel Processing Workshop)
null (Ed.)
The Message Passing Interface (MPI) has been the dominant message passing solution for scientific computing for decades. MPI point-to-point communications are highly efficient mechanisms for process-to- process communication. However, MPI performance is slowed by concurrency protections in the MPI library when processes utilize multiple threads. MPI’s current thread-level interface imposes these overheads throughout the library when thread safety is needed. While much work has been done to reduce multithreading overheads in MPI, a solution is needed that reduces the number of messages exchanged in a threaded environment. Partitioned communication is included in the MPI 4.0 standard as an alternative that addresses the challenges of multithreaded communication in MPI today. Partitioned communication reduces overall message volume by creating a buffer-sharing mechanism between threads such that they can indicate when portions of a communication buffer are available to be sent. Separation of the control and data planes in MPI is enabled by allowing persistent initialization and single occurrence message buffer matching from the indication that the data is ready to be sent. This enables the usage commands (destination, size, etc.) can be set up prior to data buffer readiness with readiness triggered by a simple doorbell/counter later. This approach is useful for future development of MPI operations in environments where traditional networking commands can have performance challenges, like accelerators (GPUs, FPGAs). In this paper,we detail the design and implementation of a layered library (built on top of MPI-3.1) and an integrated Open MPI solution that supports the new, MPI-4.0 partitioned communication feature set. The library will enable applications to use currently released MPI implementations and older legacy libraries to provide partitioned communication support while also enabling further exploration of this new communication model in new applications and use cases. We will compare the designs of the library and native Open MPI support, provide performance results and comparisons between the two approaches, and lessons learned from the implementation of partitioned communication in both library and native forms. We find that the native implementation and library have similar performance with a percentage difference under 0.94% in microbenchmarks and performance within 5% for a partitioned communication enabled proxy application.
more » « less
Full Text Available

Search for: All records