Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Word-wise Fully Homomorphic Encryption (FHE) schemes, such as CKKS, are gaining significant traction due to their ability to provide post-quantum-resistant, privacy preserving approximate computing—an especially desirable feature in the Machine-Learning-as-a-Service (MLaaS) paradigm. In this work, we introduce FIDESlib, the first open-source server-side CKKS GPU library that is fully interoperable with well-established client-side OpenFHE operations. Unlike other existing open-source GPU libraries, FIDESlib provides the first implementation featuring heavily optimized GPU kernels for all CKKS primitives, including bootstrapping. Our library also integrates robust benchmarking and testing, ensuring it remains adaptable to further optimization. Comparing our scheme against Phantom (the previously top open-source CKK library, we show that FIDESlib offers superior performance and scalability. For bootstrapping, FIDESlib achieves no less than 70× speedup over the AVX-optimized OpenFHE implementation. FIDESlib is available on Github.more » « lessFree, publicly-accessible full text available July 7, 2026
-
Processing-in-memory (PIM), where compute is moved closer to memory or data, has been explored to accelerate emerging workloads. Different PIM-based systems have been announced, each offering a unique microarchitectural organization of their compute units, ranging from fixed functional units to programmable general-purpose compute cores near memory. However, one fundamental limitation of PIM is that each compute unit can only access its local memory; access to “remote” memory must occur through the host CPU – potentially limiting application performance scalability. In this work, we first characterize the scalability of real PIM architectures using the UPMEM PIM system. We analyze how the overhead of communicating through the host (instead of providing direct communication between the PIM compute units) can become a bottleneck for collective communications that are commonly used in many workloads. To overcome this inter-PIM bank communication, we propose PIMnet – a PIM interconnection network for PIM banks that provides direct connectivity between compute units and removes the overhead of communicating through the host. PIMnet exploits bandwidth parallelism where communication across the different PIM bank/chips can occur in parallel to maximize communication performance. PIMnet also matches the DRAM packaging hierarchy with a multi-tier network architecture. Unlike traditional interconnection networks, PIMnet is a PIM controlled network where communication is managed by the PIM logic, optimizing collective communications and minimizing the hardware overhead of PIMnet. Our evaluation of PIMnet shows that it provides up to 85× speedup on collective communications and achieves a 11.8× improvement on real applications compared to the baseline PIM.more » « lessFree, publicly-accessible full text available March 1, 2026
-
With the rising adoption of deep neural networks (DNNs) for commercial and high-stakes applications that process sensitive user data and make critical decisions, security concerns are paramount. An adversary can undermine the confidentiality of user input or a DNN model, mislead a DNN to make wrong predictions, or even render a machine learning application unavailable to valid requests. While security vulnerabilities that enable such exploits can exist across multiple levels of the technology stack that supports machine learning applications, the hardware-level vulnerabilities can be particularly problematic. In this article, we provide a comprehensive review of the hardware-level vulnerabilities affecting domain-specific DNN inference accelerators and recent progress in secure hardware design to address these. As domain-specific DNN accelerators have a number of differences compared to general-purpose processors and cryptographic accelerators where the hardware-level vulnerabilities have been thoroughly investigated, there are unique challenges and opportunities for secure machine learning hardware. We first categorize the hardware-level vulnerabilities into three scenarios based on an adversary’s capability: 1) an adversary can only attack the off-chip components, such as the off-chip DRAM and the data bus; 2) an adversary can directly attack the on-chip structures in a DNN accelerator; and 3) an adversary can insert hardware trojans during the manufacturing and design process. For each category, we survey recent studies on attacks that pose practical security challenges to DNN accelerators. Then, we present recent advances in the defense solutions for DNN accelerators, addressing those security challenges with circuit-, architecture-, and algorithm-level techniques.more » « lessFree, publicly-accessible full text available February 6, 2026
An official website of the United States government
