Fault attacks on cryptographic software use faulty ciphertext to reverse engineer the secret encryption key. Although modern fault analysis algorithms are quite efficient, their practical implementation is complicated because of the uncertainty that comes with the fault injection process. First, the intended fault effect may not match the actual fault obtained after fault injection. Second, the logic target of the fault attack, the cryptographic software, is above the abstraction level of physical faults. The resulting uncertainty with respect to the fault effects in the software may degrade the efficiency of the fault attack, resulting in many more trial fault injections than the amount predicted by the theoretical fault attack. In this contribution, we highlight the important role played by the processor microarchitecture in the development of a fault attack. We introduce the microprocessor fault sensitivity model to systematically capture the fault response of a microprocessor pipeline. We also propose Microarchitecture-Aware Fault Injection Attack (MAFIA). MAFIA uses the fault sensitivity model to guide the fault injection and to predict the fault response. We describe two applications for MAFIA. First, we demonstrate a biased fault attack on an unprotected Advanced Encryption Standard (AES) software program executing on a seven-stage pipelined Reduced Instruction Set Computer (RISC) processor. The use of the microprocessor fault sensitivity model to guide the attack leads to an order of magnitude fewer fault injections compared to a traditional, blind fault injection method. Second, MAFIA can be used to break known software countermeasures against fault injection. We demonstrate this by systematically breaking a collection of state-of-the-art software fault countermeasures. These two examples lead to the key conclusion of this work, namely that software fault attacks become much more harmful and effective when an appropriate microprocessor fault sensitivity model is used. This, in turn, highlights the need for better fault countermeasures for software.
more »
« less
This content will become publicly available on September 5, 2025
FaultDetective: Explainable to a Fault, from the Design Layout to the Software
Hardware faults are a known source of security vulnerabilities. Fault injection in secure embedded systems leads to information leakage and privilege escalation, and countless fault attacks have been demonstrated both in simulation and in practice. However, there is a significant gap between simulated fault attacks and physical fault attacks. Simulations use idealized fault models such as single-bit flips with uniform distribution. These ideal fault models may not hold in practice. On the other hand, practical experiments lack the white-box visibility necessary to determine the true nature of the fault, leading to probabilistic vulnerability assessments and unexplained results. In embedded software, this problem is further exacerbated by the layered abstractions between the hardware (where the fault originates) and the application software (where the fault effect is observed). We present FaultDetective, a method to investigate the root-cause of fault injection from fault detection in software. Our main insight is that fault detection in software is only the end-point of a chain of events that starts with a fault manifestation in hardware and propagates through the micro-architecture and architecture before reaching the software level. To understand the fault effects at the hardware level, we use a scan chain, a low-level hardware test structure. We then use white-box simulation to propagate and observe hardware faults in the embedded software. We efficiently visualize the fault propagation across abstraction levels using a hash-tree representation of the scan chain. We implement this concept in a multi-core MSP430 micro-controller that redundantly executes an application in lock-step. With this setup, we observe the fault effects for several different stressors, including clock glitching and thermal laser stimulation, and explain the root-cause in each case.
more »
« less
- Award ID(s):
- 2219810
- PAR ID:
- 10585890
- Publisher / Repository:
- IACR
- Date Published:
- Journal Name:
- IACR Transactions on Cryptographic Hardware and Embedded Systems
- Volume:
- 2024
- Issue:
- 4
- ISSN:
- 2569-2925
- Page Range / eLocation ID:
- 610 to 632
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
We propose AccHashtag, the first framework for high-accuracy detection of fault-injection attacks on Deep Neural Networks (DNNs) with provable bounds on detection performance. Recent literature in fault-injection attacks shows the severe DNN accuracy degradation caused by bit flips. In this scenario, the attacker changes a few DNN weight bits during execution by injecting faults to the dynamic random-access memory (DRAM). To detect bit flips, AccHashtag extracts a unique signature from the benign DNN prior to deployment. The signature is used to validate the model’s integrity and verify the inference output on the fly. We propose a novel sensitivity analysis that identifies the most vulnerable DNN layers to the fault-injection attack. The DNN signature is constructed by encoding the weights in vulnerable layers using a low-collision hash function. During DNN inference, new hashes are extracted from the target layers and compared against the ground-truth signatures. AccHashtag incorporates a lightweight methodology that allows for real-time fault detection on embedded platforms. We devise a specialized compute core for AccHashtag on field-programmable gate arrays (FPGAs) to facilitate online hash generation in parallel to DNN execution. Extensive evaluations with the state-of-the-art bit-flip attack on various DNNs demonstrate the competitive advantage of AccHashtag in terms of both attack detection and execution overhead.more » « less
-
Distributed systems are hard to reason about largely because of uncertainty about what may go wrong in a particular execution, and about whether the system will mitigate those faults. Tools that perturb executions can help test whether a system is robust to faults, while tools that observe executions can help better understand their system-wide effects. We present Box of Pain, a tracer and fault injector for unmodified distributed systems that addresses both concerns by interposing at the system call level and dynamically reconstructing the partial order of communication events based on causal relationships. Box of Pain’s lightweight approach to tracing and focus on simulating the effects of partial failures on communication rather than the failures themselves sets it apart from other tracing and fault injection systems. We present evidence of the promise of Box of Pain and its approach to lightweight observation and perturbation of distributed systems.more » « less
-
Debugging a failure usually requires reproducing it first. This can be hard for failures in production distributed systems, where bugs are exposed only by some unusual faulty events. While fault injection testing becomes popular, existing solutions are designed for bug finding. They are ineffective and inefficient to reproduce a specific failure during debugging. We explore a new type of fault injection technique for quickly reproducing a given fault-induced production failure in distributed systems. We present a tool, Anduril, that uses static causal analysis and a novel feedback-driven algorithm to quickly search the enormous fault space for the root-cause fault and timing. We evaluate Anduril on 22 real-world complex fault-induced failures from five large-scale distributed systems. Anduril reproduced all failures by identifying and injecting the root-cause faults at the right time, in a median of 8 minutes.more » « less
-
Large-scale distributed systems must be built to anticipate and mitigate a variety of hardware and software failures. In order to build confidence that fault-tolerant systems are correctly implemented, Netflix (and similar enterprises) regularly run failure drills in which faults are deliberately injected in their production system. The combinatorial space of failure scenarios is too large to explore exhaustively. Existing failure testing approaches either randomly explore the space of potential failures randomly or exploit the "hunches" of domain experts to guide the search. Random strategies waste resources testing "uninteresting" faults, while programmer-guided approaches are only as good as human intuition and only scale with human effort. In this paper, we describe how we adapted and implemented a research prototype called lineage-driven fault injection (LDFI) to automate failure testing at Netflix. Along the way, we describe the challenges that arose adapting the LDFI model to the complex and dynamic realities of the Netflix architecture. We show how we implemented the adapted algorithm as a service atop the existing tracing and fault injection infrastructure, and present early results.more » « less