skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Fault Recovery from Multi-Tenant FPGA Voltage Attacks
As multi-tenant FPGA applications continue to scale in size and complexity, their need for resilience against environmental effects and malicious actions continues to grow. To ensure continuously correct computation, faults in the compute fabric must be identified, isolated, and suppressed in the nanosecond to microsecond range. In this paper, we detail a circuit and system-level methodology to detect compute failure conditions due to on-FPGA voltage attacks. Our approach rapidly suppresses incorrect results and regenerates potentially-tainted results before they propagate, allowing time for an attacker to be suppressed. Instrumentation includes voltage sensors to detect error conditions induced by attackers. This analysis is paired with focused remediation approaches involving data buffering, fault suppression, results recalculation, and computation restart. Our approach has been demonstrated using an RSA encryption circuit implemented on a Stratix 10 FPGA. We show that a voltage attack using on-FPGA power wasters can be effectively detected and computation halted in 15 ns, preventing the injection of timing faults. Potentially tainted results are successfully regenerated, allowing for fault-free circuit operation. A full characterization of the latency and resource overheads of fault detection and recovery is provided.  more » « less
Award ID(s):
1902532
PAR ID:
10428836
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023
Page Range / eLocation ID:
557 to 562
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Advanced, superscalar microprocessors (μP) are highly susceptible to wear-out failures because of their highly complex, densely packed circuit structure and extreme operational frequencies. Although many types of fault detection and mitigation strategies have been proposed, none have addressed the specific problem of detecting faults that lead to information leakage events on I/O channels of the μP. Information leakage can be defined very generally as any type of output that the executing program did not intend to produce. In this work, we restrict this definition to output that represents a security concern, and in particular, to the leakage of plaintext or encryption keys, and propose a counter-based countermeasure to detect faults that cause this type of leakage event. Fault injection (FI) experiments are carried out on two RISC-V microprocessors emulated as soft cores on a Xilinx multi-processor System-on-chip (MPSoC) FPGA. The μP designs are instrumented with a set of counters that records the number of transitions that occur on internal nodes. The transition counts are collected from all internal nodes under both fault-free and faulty conditions, and are analyzed to determine which counters provide the highest fault coverage and lowest latency for detecting leakage faults. We show that complete coverage of all leakage faults is possible using only a single counter strategically placed within the branch compare logic of the μPs. 
    more » « less
  2. This paper proposes a simple and fast technique for power device open circuit (OC) fault detection in stacked multicell converters (SMCs). A mitigation technique allowing for fault-tolerant operation using a simple front-end routing circuit is also proposed for SMCs. The fault detection concept only needs to sense the voltage and direction of current at the output terminal of the SMC to detect and localize an OC switch fault to a particular rail of the SMC. The proposed technique compares the measured and expected voltage levels considering the commanded switch states and the direction of the terminal current flow. Once an OC fault is detected and localized, the front-end routing circuit will be activated to reconfigure the SMC converter to a simple flying capacitor multilevel converter (FCMC) to maintain the output power flow with a reduced number of voltage levels. A window detector circuit is proposed to track the output voltage level and current direction with high bandwidth. Simulations were performed to validate the fault detection method and router performance. The functionality of windows detector is investigated with a hardware prototype 7 level 300 V SMC. 
    more » « less
  3. Vertical take-off and landing (VTOL) vehicles are becoming increasingly popular for real-world transport; but, as with any vehicle, guaranteeing safety is both extremely critical and highly challenging due to issues like rotor faults. Existing fault detection and isolation (FDI) techniques usually focus on multirotor systems or fixed wing systems, rather than the hybrid VTOLs. Since VTOLs have both rotors and ailerons, a fault in a rotor may be masked by the (correctly working) ailerons, making it much more difficult to detect faults. However, this masking only works when ailersons are used (e.g., during cruising), leaving the takeoff and landing vulnerable to crashes. This paper presents an online rotor fault detection and isolation (FDI) method for VTOLs. The approach uses pose analysis and aileron command data to quickly and accurately identify the faulty rotor and to compute the severity of the fault. Our method works for hard-to-detect fault scenarios, such as small-severity faults that are masked during cruise flight but not during vertical motion. We evaluated our technique in a SITL PX4 simulation of a modified Deltaquad QuadPlane. The results show that our FDI technique can quickly detect and isolate faults in real time (within 1s-2.5s) and achieve high isolation success rate (91.67%) across six rotors, and that it can estimate the severity of faults to within 2%. When applying a simple recovery process post-isolation, the system consistently achieved safe landing. 
    more » « less
  4. Recent research has exposed a number of security issues related to the use of FPGAs in embedded system and cloud computing environments. Circuits that deliberately waste power can be carefully crafted by a malicious cloud FPGA user and deployed to cause denial-of-service and fault injection attacks. The main defense strategy used by FPGA cloud services involves checking user-submitted designs for circuit structures that are known to aggressively consume power. Unfortunately, this approach is limited by an attacker’s ability to conceive new designs that defeat existing checkers. In this work, our contributions are twofold. We evaluate a variety of circuit power wasting techniques that typically are not flagged by design rule checks imposed by FPGA cloud computing vendors. The efficiencies of five power wasting circuits, including our new design, are evaluated in terms of power consumed per logic resource. We then show that the source of voltage attacks based on power wasters can be identified. Our monitoring approach localizes the attack and suppresses the clock signal for the target region within 21 μs, which is fast enough to stop an attack before it causes a board reset. All experiments are performed using a state-of-the-art Intel Stratix 10 FPGA. 
    more » « less
  5. Abstract Test sets that target standard fault models may not always be sufficient for detecting all defects. To evaluate test sets for the detection of unmodeled defects,n-detect test sets (which detect all modeled faults at leastntimes) have previously been proposed. Unfortunately,n-detect test sets are often prohibitively long. In this paper, we investigate the ability of shadow flip-flops connected into a MISR (Multiple Input Signature Register) to detect stuck-at faults fortuitously multiple times during scan shift. We explore which flip-flops should be shadowed to increase the value ofnfor the least detected stuck-at faults for each circuit studied. We then identify which circuit characteristics are most important for determining the cost of the MISR needed to achieve high values ofn. For example, circuits that contain a few flip-flops with upstream fault cones that cover a large percentage of all faults in the circuit can often achieve highn-detect coverage fortuitously with a low-cost MISR. This allows a DFT engineer to predict the viability of this MISR-based approach early in the design cycle. 
    more » « less