NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Reliability and Security of AI Hardware

Gnad, Dennis; Gotthard, Martin; Krautter, Jonas; Kritikakou, Angeliki; Meyers, Vincent; Rech, Paolo; Condia, Josie_Rodriguez; Ruospo, Annachiara; Sanchez, Ernesto; dos_Santos, Fernando_Fernandes; et al (May 2024, IEEE)

In recent years, Artificial Intelligence (AI) systems have achieved revolutionary capabilities, providing intelligent solutions that surpass human skills in many cases. However, such capabilities come with power-hungry computation workloads. Therefore, the implementation of hardware acceleration becomes as fundamental as the software design to improve energy efficiency, silicon area, and latency of AI systems. Thus, innovative hardware platforms, architectures, and compiler-level approaches have been used to accelerate AI workloads. Crucially, innovative AI acceleration platforms are being adopted in application domains for which dependability must be paramount, such as autonomous driving, healthcare, banking, space exploration, and industry 4.0. Unfortunately, the complexity of both AI software and hardware makes the dependability evaluation and improvement extremely challenging. Studies have been conducted on both the security and reliability of AI systems, such as vulnerability assessments and countermeasures to random faults and analysis for side-channel attacks. This paper describes and discusses various reliability and security threats in AI systems, and presents representative case studies along with corresponding efficient countermeasures.
more » « less
Full Text Available
A Visionary Look at the Security of Reconfigurable Cloud Computing

https://doi.org/10.1109/JPROC.2023.3330729

Stojilović, Mirjana; Rasmussen, Kasper; Regazzoni, Francesco; Tahoori, Mehdi B; Tessier, Russell (December 2023, Proceedings of the IEEE)

Full Text Available
Fault Recovery from Multi-Tenant FPGA Voltage Attacks

https://doi.org/10.1145/3583781.3590246

Moini, Shayan; Kansagara, Dhruv; Holcomb, Daniel; Tessier, Russell (June 2023, GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023)

As multi-tenant FPGA applications continue to scale in size and complexity, their need for resilience against environmental effects and malicious actions continues to grow. To ensure continuously correct computation, faults in the compute fabric must be identified, isolated, and suppressed in the nanosecond to microsecond range. In this paper, we detail a circuit and system-level methodology to detect compute failure conditions due to on-FPGA voltage attacks. Our approach rapidly suppresses incorrect results and regenerates potentially-tainted results before they propagate, allowing time for an attacker to be suppressed. Instrumentation includes voltage sensors to detect error conditions induced by attackers. This analysis is paired with focused remediation approaches involving data buffering, fault suppression, results recalculation, and computation restart. Our approach has been demonstrated using an RSA encryption circuit implemented on a Stratix 10 FPGA. We show that a voltage attack using on-FPGA power wasters can be effectively detected and computation halted in 15 ns, preventing the injection of timing faults. Potentially tainted results are successfully regenerated, allowing for fault-free circuit operation. A full characterization of the latency and resource overheads of fault detection and recovery is provided.
more » « less
Full Text Available
A Practical Remote Power Attack on Machine Learning Accelerators in Cloud FPGAs

https://doi.org/10.23919/DATE56975.2023.10136956

Tian, Shanquan; Moini, Shayan; Holcomb, Daniel; Tessier, Russell; Szefer, Jakub (April 2023, 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE))

The security and performance of FPGA-based accelerators play vital roles in today’s cloud services. In addition to supporting convenient access to high-end FPGAs, cloud vendors and third-party developers now provide numerous FPGA accelerators for machine learning models. However, the security of accelerators developed for state-of-the-art Cloud FPGA environments has not been fully explored, since most remote accelerator attacks have been prototyped on local FPGA boards in lab settings, rather than in Cloud FPGA environments. To address existing research gaps, this work analyzes three existing machine learning accelerators developed in Xilinx Vitis to assess the potential threats of power attacks on accelerators in Amazon Web Services (AWS) F1 Cloud FPGA platforms, in a multi-tenant setting. The experiments show that malicious co-tenants in a multi-tenant environment can instantiate voltage sensing circuits as register-transfer level (RTL) kernels within the Vitis design environment to spy on co-tenant modules. A methodology for launching a practical remote power attack on Cloud FPGAs is also presented, which uses an enhanced time-to-digital (TDC) based voltage sensor and auto-triggered mechanism. The TDC is used to capture power signatures, which are then used to identify power consumption spikes and observe activity patterns involving the FPGA shell, DRAM on the FPGA board, or the other co-tenant victim’s accelerators. Voltage change patterns related to shell use and accelerators are then used to create an auto-triggered attack that can automatically detect when to capture voltage traces without the need for a hard-wired synchronization signal between victim and attacker. To address the novel threats presented in this work, this paper also discusses defenses that could be leveraged to secure multi-tenant Cloud FPGAs from power-based attacks.
more » « less
Full Text Available
Voltage Sensor Implementations for Remote Power Attacks on FPGAs

https://doi.org/10.1145/3555048

Moini, Shayan; Deric, Aleksa; Li, Xiang; Provelengios, George; Burleson, Wayne; Tessier, Russell; Holcomb, Daniel (March 2023, ACM Transactions on Reconfigurable Technology and Systems)

This article presents a study of two types of on-chip FPGA voltage sensors based on ring oscillators (ROs) and time-to-digital converter (TDCs), respectively. It has previously been shown that these sensors are often used to extract side-channel information from FPGAs without physical access. The performance of the sensors is evaluated in the presence of circuits that deliberately waste power, resulting in localized voltage drops. The effects of FPGA power supply features and sensor sensitivity in detecting voltage drops in an FPGA power distribution network (PDN) are evaluated for Xilinx Artix-7, Zynq 7000, and Zynq UltraScale+ FPGAs. We show that both sensor types are able to detect supply voltage drops, and that their measurements are consistent with each other. Our findings show that TDC-based sensors are more sensitive and can detect voltage drops that are shorter in duration, while RO sensors are easier to implement because calibration is not required. Furthermore, we present a new time-interleaved TDC design that sweeps the sensor phase. The new sensor generates data that can reconstruct voltage transients on the order of tens of picoseconds.
more » « less
Full Text Available
Jitter-based Adaptive True Random Number Generation Circuits for FPGAs in the Cloud

Xiang Li, Peter Stanwicks (March 2023, ACM transactions on reconfigurable technology and systems)
Deming Chen (Ed.)
In this paper,we present and evaluate a true random number generator (TRNG) design that is compatible with the restrictions imposed by cloud-based Field Programmable Gate Array (FPGA) providers such as Amazon Web Services (AWS) EC2 F1. Because cloud FPGA providers disallow the ring oscillator circuits that conventionally generate TRNG entropy, our design is oscillator-free and uses clock jitter as its entropy source. The clock jitter is harvested with a time-to-digital converter (TDC) and a controllable delay line that is continuously tuned to compensate for process, voltage, and temperature variations. After describing the design, we present and validate a stochastic model that conservatively quantifies its worst-case entropy. We deploy and model the design in the cloud on 60 EC2 F1 FPGA instances to ensure sufficient randomness is captured. TRNG entropy is further validated using NIST test suites, and experiments are performed to understand how the TRNG responds to on-die power attacks that disturb the FPGA supply voltage in the vicinity of the TRNG. After introducing and validating our basic TRNG design, we introduce and validate a new variant that uses four instances of a linkable sampling module to increase the entropy per sample, and improve throughput. The new variant improves throughput by 250% at a modest 17% increase in CLB count.
more » « less
Full Text Available
The Future of FPGA Acceleration in Datacenters and the Cloud

https://doi.org/10.1145/3506713

Bobda, Christophe; Mbongue, Joel Mandebi; Chow, Paul; Ewais, Mohammad; Tarafdar, Naif; Vega, Juan Camilo; Eguro, Ken; Koch, Dirk; Handagala, Suranga; Leeser, Miriam; et al (September 2022, ACM Transactions on Reconfigurable Technology and Systems)

In this article, we survey existing academic and commercial efforts to provide Field-Programmable Gate Array (FPGA) acceleration in datacenters and the cloud. The goal is a critical review of existing systems and a discussion of their evolution from single workstations with PCI-attached FPGAs in the early days of reconfigurable computing to the integration of FPGA farms in large-scale computing infrastructures. From the lessons learned, we discuss the future of FPGAs in datacenters and the cloud and assess the challenges likely to be encountered along the way. The article explores current architectures and discusses scalability and abstractions supported by operating systems, middleware, and virtualization. Hardware and software security becomes critical when infrastructure is shared among tenants with disparate backgrounds. We review the vulnerabilities of current systems and possible attack scenarios and discuss mitigation strategies, some of which impact FPGA architecture and technology. The viability of these architectures for popular applications is reviewed, with a particular focus on deep learning and scientific computing. This work draws from workshop discussions, panel sessions including the participation of experts in the reconfigurable computing field, and private discussions among these experts. These interactions have harmonized the terminology, taxonomy, and the important topics covered in this manuscript.
more » « less
Full Text Available
Precise Fault Injection to Enable DFIA for Attacking AES in Remote FPGAs

https://doi.org/10.1109/FCCM53951.2022.9786154

Li, Xiang; Tessier, Russell; Holcomb, Daniel (May 2022, 2022 IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM))

Full Text Available
Mitigating Voltage Attacks in Multi-Tenant FPGAs

https://doi.org/10.1145/3451236

Provelengios, George; Holcomb, Daniel; Tessier, Russell (July 2021, ACM Transactions on Reconfigurable Technology and Systems)

Recent research has exposed a number of security issues related to the use of FPGAs in embedded system and cloud computing environments. Circuits that deliberately waste power can be carefully crafted by a malicious cloud FPGA user and deployed to cause denial-of-service and fault injection attacks. The main defense strategy used by FPGA cloud services involves checking user-submitted designs for circuit structures that are known to aggressively consume power. Unfortunately, this approach is limited by an attacker’s ability to conceive new designs that defeat existing checkers. In this work, our contributions are twofold. We evaluate a variety of circuit power wasting techniques that typically are not flagged by design rule checks imposed by FPGA cloud computing vendors. The efficiencies of five power wasting circuits, including our new design, are evaluated in terms of power consumed per logic resource. We then show that the source of voltage attacks based on power wasters can be identified. Our monitoring approach localizes the attack and suppresses the clock signal for the target region within 21 μs, which is fast enough to stop an attack before it causes a board reset. All experiments are performed using a state-of-the-art Intel Stratix 10 FPGA.
more » « less
Full Text Available
Power Side-Channel Attacks on BNN Accelerators in Remote FPGAs

https://doi.org/10.1109/JETCAS.2021.3074608

Moini, Shayan; Tian, Shanquan; Holcomb, Daniel; Szefer, Jakub; Tessier, Russell (June 2021, IEEE Journal on Emerging and Selected Topics in Circuits and Systems)

Full Text Available

« Prev Next »

Search for: All records