skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 13 until 2:00 AM ET on Saturday, September 14 due to maintenance. We apologize for the inconvenience.


Title: An Empirical Study of Cryptographic Libraries for MPI Communications
As High Performance Computing (HPC) applications with data security requirements are increasingly moving to execute in the public cloud, there is a demand that the cloud infrastructure for HPC should support privacy and integrity. Incorporating privacy and integrity mechanisms in the communication infrastructure of today's public cloud is challenging because recent advances in the networking infrastructure in data centers have shifted the communication bottleneck from the network links to the network end points and because encryption is computationally intensive. In this work, we consider incorporating encryption to support privacy and integrity in the Message Passing Interface (MPI) library, which is widely used in HPC applications. We empirically study four contemporary cryptographic libraries, OpenSSL, BoringSSL, Libsodium, and CryptoPP using micro-benchmarks and NAS parallel benchmarks to evaluate their overheads for encrypting MPI messages on two different networking technologies, 10Gbps Ethernet and 40Gbps InfiniBand. The results indicate that (1) the performance differs drastically across cryptographic libraries, and (2) effectively supporting privacy and integrity in MPI communications on high speed data center networks is challenging-even with the most efficient cryptographic library, encryption can still introduce very significant overheads in some scenarios such as a single MPI communication operation on InfiniBand, but (3) the overall overhead may not be prohibitive for practical uses since there can be multiple concurrent communications.  more » « less
Award ID(s):
1822737 1738912
NSF-PAR ID:
10162652
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
IEEE International Conference on Cluster Computing (CLUSTER)
Page Range / eLocation ID:
1 to 11
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    The Message Passing Interface (MPI) has been the dominant message passing solution for scientific computing for decades. MPI point-to-point communications are highly efficient mechanisms for process-to- process communication. However, MPI performance is slowed by concurrency protections in the MPI library when processes utilize multiple threads. MPI’s current thread-level interface imposes these overheads throughout the library when thread safety is needed. While much work has been done to reduce multithreading overheads in MPI, a solution is needed that reduces the number of messages exchanged in a threaded environment. Partitioned communication is included in the MPI 4.0 standard as an alternative that addresses the challenges of multithreaded communication in MPI today. Partitioned communication reduces overall message volume by creating a buffer-sharing mechanism between threads such that they can indicate when portions of a communication buffer are available to be sent. Separation of the control and data planes in MPI is enabled by allowing persistent initialization and single occurrence message buffer matching from the indication that the data is ready to be sent. This enables the usage commands (destination, size, etc.) can be set up prior to data buffer readiness with readiness triggered by a simple doorbell/counter later. This approach is useful for future development of MPI operations in environments where traditional networking commands can have performance challenges, like accelerators (GPUs, FPGAs). In this paper,we detail the design and implementation of a layered library (built on top of MPI-3.1) and an integrated Open MPI solution that supports the new, MPI-4.0 partitioned communication feature set. The library will enable applications to use currently released MPI implementations and older legacy libraries to provide partitioned communication support while also enabling further exploration of this new communication model in new applications and use cases. We will compare the designs of the library and native Open MPI support, provide performance results and comparisons between the two approaches, and lessons learned from the implementation of partitioned communication in both library and native forms. We find that the native implementation and library have similar performance with a percentage difference under 0.94% in microbenchmarks and performance within 5% for a partitioned communication enabled proxy application. 
    more » « less
  2. The HPC community is actively researching and evaluating tools to support execution of scientific applications in cloud-based environ- ments. Among the various technologies, containers have recently gained importance as they have significantly better performance compared to full-scale virtualization, support for microservices and DevOps, and work seamlessly with workflow and orchestration tools. Docker is currently the leader in containerization technology because it offers low overhead, flexibility, portability of applications, and reproducibility. Singularity is another container solution that is of interest as it is designed specifically for scientific applications. It is important to conduct performance and feature analysis of the container technologies to understand their applicability for each application and target execution environment. This paper presents a (1) performance evaluation of Docker and Singularity on bare metal nodes in the Chameleon cloud (2) mecha- nism by which Docker containers can be mapped with InfiniBand hardware with RDMA communication and (3) analysis of mapping elements of parallel workloads to the containers for optimal re- source management with container-ready orchestration tools. Our experiments are targeted toward application developers so that they can make informed decisions on choosing the container tech- nologies and approaches that are suitable for their HPC workloads on cloud infrastructure. Our performance analysis shows that sci- entific workloads for both Docker and Singularity based containers can achieve near-native performance. Singularity is designed specifically for HPC workloads. However, Docker still has advantages over Singularity for use in clouds as it provides overlay networking and an intuitive way to run MPI applications with one container per rank for fine-grained resources allocation. Both Docker and Singularity make it possible to directly use the underlying network fabric from the containers for coarse- grained resource allocation. 
    more » « less
  3. The SPEChpc™ 2021 suites are application-based benchmarks designed to measure performance of modern HPC systems. The benchmarks support MPI, MPI+OpenMP, MPI+OpenMP target offload, MPI+OpenACC and are portable across all major HPC platforms. 
    more » « less
  4. Even as end-to-end encrypted communication becomes more popular, private messaging remains a challenging problem due to metadata leakages, such as who is communicating with whom. Most existing systems that hide communication metadata either (1) do not scale easily, (2) incur significant overheads, or (3) provide weaker guarantees than cryptographic privacy, such as differential privacy or heuristic privacy. This paper presents XRD (short for Crossroads), a metadata private messaging system that provides cryptographic privacy, while scaling easily to support more users by adding more servers. At a high level, XRD uses multiple mix networks in parallel with several techniques, including a novel technique we call aggregate hybrid shuffle. As a result, XRD can support 2 million users with 228 seconds of latency with 100 servers. This is 13.3x and 4x faster than Atom and Pung, respectively, which are prior scalable messaging systems with cryptographic privacy. 
    more » « less
  5. null (Ed.)
    Large-scale, high-throughput computational science faces an accelerating convergence of software and hardware. Software container-based solutions have become common in cloud-based datacenter environments, and are considered promising tools for addressing heterogeneity and portability concerns. However, container solutions reflect a set of assumptions which complicate their adoption by developers and users of scientific workflow applications. Nor are containers a universal solution for deployment in high-performance computing (HPC) environments which have specialized and vertically integrated scheduling and runtime software stacks. In this paper, we present a container design and deployment approach which uses modular layering to ease the deployment of containers into existing HPC environments. This layered approach allows operating system integrations, support for different communication and performance monitoring libraries, and application code to be defined and interchanged in isolation. We describe in this paper the details of our approach, including specifics about container deployment and orchestration for different HPC scheduling systems. We also describe how this layering method can be used to build containers for two separate applications, each deployed on clusters with different batch schedulers, MPI networking support, and performance monitoring requirements. Our experience indicates that the layered approach is a viable strategy for building applications intended to provide similar behavior across widely varying deployment targets. 
    more » « less