The increased parallelism in modern processors has sparked interest in offloading security policy enforcement to processes or hardware operating in parallel with the main application. This approach can reduce application latency, enhance security, and improve compatibility. However, existing software solutions often incur high overheads and are susceptible to memory corruption attacks, while hardware solutions tend to be inflexible and require substantial modifications to the processor. In this paper, we present SIDECAR, a novel approach that offloads security checks to run concurrently with applications by leveraging the debugging infrastructure available in commodity processors. Specifically, we utilize softwaredriven logging (SDL) extensions in Intel and Arm processors to create secure, append-only channels between applications and security monitors. We build and evaluate a prototype of SIDECAR for the x86-64 and Aarch64 architectures. To demonstrate its utility, we adapt well-known security defenses within SIDECAR, providing control-flow integrity (CFI), shadow call stacks (SCS), and memory error checking (ASAN). Our evaluation shows that these extensions perform better on the Intel architecture. In terms of defenses, SIDECAR reduces the latency of CFI in the tested real-world applications by an average of 30%, offers enhanced security with similar overhead for SCS, and is versatile enough to support complex defenses like ASAN. Furthermore, our security monitor for CFI+SCS is 30 times more efficient compared to previous work.
more »
« less
Exploring source-to-source compiler transformation of OpenMP SIMD constructs for Intel AVX and Arm SVE vector architectures
Over the past decade, SIMD (single instruction multiple data) or vector architectures have made significant advances, now existing across a wide range of devices from commodity CPUs to high performance computing (HPC) cores. Intel's AVX (Advanced Vector Extensions) architecture has been one of the most popular SIMD extensions to commodity and HPC CPUs from Intel. Over the past few years, Arm has made significant inroads with its new SVE (Scalable Vector Extension), used in the supercomputer of the top place on the Top500 list. As SIMD has become more advanced and more important, it has become equally important the compilers support these architecture extensions. In this paper, we present our approach of source-to-source compiler transformation of explicit vectorization constructs using the OpenMP SIMD directive. We present the design of a unified IR that is easily translated to AVX and SVE vector architectures. Finally, we conduct performance evaluations on Intel AVX and Arm SVE to demonstrate how this method of vectorization can bridge the gap between auto- and manual- vectorization.
more »
« less
- Award ID(s):
- 2015254
- PAR ID:
- 10394960
- Date Published:
- Journal Name:
- PMAM '22: Proceedings of the Thirteenth International Workshop on Programming Models and Applications for Multicores and Manycores
- Page Range / eLocation ID:
- 11 to 20
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
This study presents a comprehensive benchmarking analysis of the Arm-based AmpereOne A192-32X CPU, a high-performance but low power processor designed for cloud-native workloads characterized by high core occupancy, imperfectly-vectorized or even pure scalar software, limited need for high floating-point performance, and, increasingly, AI inference. These traits also characterize much of academic research computing. Hence a thorough investigation of this novel CPU seeking to characterize its strengths and weaknesses on academic workloads, including traditional HPC codes for which it was not designed, will shed light on its relevance in a research setting. We report comparative analyses with contemporary CPUs (Intel Sapphire Rapids, AMD EPYC, NVIDIA Grace-Grace) and illustrate AmpereOne’s architectural advantages in handling parallel workloads and optimizing power consumption. The CPUs are compared in terms of performance and power consumption using a wide range of applications covering different workloads and disciplines.more » « less
-
Bhatele, A.; Hammond, J.; Baboulin, M.; Kruse, C. (Ed.)The reactive force field (ReaxFF) interatomic potential is a powerful tool for simulating the behavior of molecules in a wide range of chemical and physical systems at the atomic level. Unlike traditional classical force fields, ReaxFF employs dynamic bonding and polarizability to enable the study of reactive systems. Over the past couple decades, highly optimized parallel implementations have been developed for ReaxFF to efficiently utilize modern hardware such as multi-core processors and graphics processing units (GPUs). However, the complexity of the ReaxFF potential poses challenges in terms of portability to new architectures (AMD and Intel GPUs, RISC-V processors, etc.), and limits the ability of computational scientists to tailor its functional form to their target systems. In this regard, the convergence of cyber-infrastructure for high performance computing (HPC) and machine learning (ML) presents new opportunities for customization, programmer productivity and performance portability. In this paper, we explore the benefits and limitations of JAX, a modern ML library in Python representing a prime example of the convergence of HPC and ML software, for implementing ReaxFF. We demonstrate that by leveraging auto-differentiation, just-in-time compilation, and vectorization capabilities of JAX, one can attain a portable, performant, and easy to maintain ReaxFF software. Beyond enabling MD simulations, end-to-end differentiability of trajectories produced by ReaxFF implemented with JAX makes it possible to perform related tasks such as force field parameter optimization and meta-analysis without requiring any significant software developments. We also discuss scalability limitations using the current version of JAX for ReaxFF simulations.more » « less
-
Today, isolated trusted computation and code execution is of paramount importance to protect sensitive information and workflows from other malicious privileged or unprivileged software. Intel Software Guard Extensions (SGX) is a set of security architecture extensions first introduced in the Skylake microarchitecture that enables a Trusted Execution Environment (TEE). It provides an ‘inverse sandbox’, for sensitive programs, and guarantees the integrity and confidentiality of secure computations, even from the most privileged malicious software (e.g. OS, hypervisor). SGX-capable CPUs only became available in production systems in Q3 2015, and they are not yet fully supported and adopted in systems. Besides the capability in the CPU, the BIOS also needs to provide support for the enclaves, and not many vendors have released the required updates for the system support. This has led to many wrong assumptions being made about the capabilities, features, and ultimately dangers of secure enclaves. By having access to resources and publications such as white papers, patents and the actual SGX-capable hardware and software development environment, we are in a privileged position to be able to investigate and demystify SGX. In this paper, we first review the previous trusted execution technologies, such as ARM Trust Zone and Intel TXT, to better understand and appreciate the new innovations of SGX. Then, we look at the details of SGX technology, cryptographic primitives and the underlying concepts that power it, namely the sealing, attestation, and the Memory Encryption Engine (MEE). We also consider use cases such as trusted and secure code execution on an untrusted cloud platform, and digital rights management (DRM). This is followed by an overview of the software development environment and the available libraries.more » « less
-
Abstract Background Bioinformatic workflows frequently make use of automated genome assembly and protein clustering tools. At the core of most of these tools, a significant portion of execution time is spent in determining optimal local alignment between two sequences. This task is performed with the Smith-Waterman algorithm, which is a dynamic programming based method. With the advent of modern sequencing technologies and increasing size of both genome and protein databases, a need for faster Smith-Waterman implementations has emerged. Multiple SIMD strategies for the Smith-Waterman algorithm are available for CPUs. However, with the move of HPC facilities towards accelerator based architectures, a need for an efficient GPU accelerated strategy has emerged. Existing GPU based strategies have either been optimized for a specific type of characters (Nucleotides or Amino Acids) or for only a handful of application use-cases. Results In this paper, we present ADEPT, a new sequence alignment strategy for GPU architectures that is domain independent, supporting alignment of sequences from both genomes and proteins. Our proposed strategy uses GPU specific optimizations that do not rely on the nature of sequence. We demonstrate the feasibility of this strategy by implementing the Smith-Waterman algorithm and comparing it to similar CPU strategies as well as the fastest known GPU methods for each domain. ADEPT’s driver enables it to scale across multiple GPUs and allows easy integration into software pipelines which utilize large scale computational systems. We have shown that the ADEPT based Smith-Waterman algorithm demonstrates a peak performance of 360 GCUPS and 497 GCUPs for protein based and DNA based datasets respectively on a single GPU node (8 GPUs) of the Cori Supercomputer. Overall ADEPT shows 10x faster performance in a node-to-node comparison against a corresponding SIMD CPU implementation. Conclusions ADEPT demonstrates a performance that is either comparable or better than existing GPU strategies. We demonstrated the efficacy of ADEPT in supporting existing bionformatics software pipelines by integrating ADEPT in MetaHipMer a high-performance denovo metagenome assembler and PASTIS a high-performance protein similarity graph construction pipeline. Our results show 10% and 30% boost of performance in MetaHipMer and PASTIS respectively.more » « less
An official website of the United States government

