skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Schaumont, Patrick"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Microscaling (MX) formats share one exponent across an element block, creating a new fault surface: a single-bit flip in the shared exponent corrupts all block elements simultaneously. Five MX compatible 8-bit encodings (MXINT8, MXFP8-E4M3, MXFP8-E5M2, LOG8-SUM, and LOG8-MAX) are compared through exhaustive single-bit weight fault injection across four workloads using gate level simulation on the CAPRI1 RISC-V SoC, with the primary workload additionally validated on fabricated silicon. Format choice alone can substantially change vulnerability. Three MX-specific mechanisms explain this variation: block-shared exponent amplification, attention routing inversion from exponent-field faults, and log-domain fault filtering in the max-reduction variant. No single format dominates all axes: LOG8-MAX leads speed and energy but is limited by accuracy on some attention workloads, MXINT8 provides the strongest fault resilience with full accuracy, and LOG8-SUM offers a balanced compromise across speed, accuracy, and resilience. MX element encoding should be treated not only as an accuracy-efficiency choice, but also as a security-relevant design decision. 
    more » « less
  2. A single fault in a shared block exponent can silently corrupt an entire neural-network inference block. Microscaling (MX) formats, which share one 8-bit exponent across 32 elements to reduce storage and bandwidth cost, propagate this fault through a barrel shifter to every element simultaneously. This paper presents a security analysis of five MX-compatible 8-bit formats on a unified RISC-V coprocessor, combining a comparative format study with state-monitor-assisted fault attribution under voltage underscaling and clock glitching. Across the five-format comparison, the proposed LOG8-MAX format achieves the lowest observed fault rate at the lowest hardware cost (905 LUTs), while MXFP8-E5M2 is the most vulnerable (27 silent faults on QNN, 65,536× worst-case amplification) and the most expensive (2,178 LUTs). The experiments identify two primary fault mechanisms: shared-exponent amplification across all formats, and reduction-operator containment specific to the LOG8-MAX comparator tree. A 129-bit state monitor shows that the first silent fault under undervoltage does not always originate in the shared exponent: CPU-path faults can appear earlier with the coprocessor state still intact. Shared-exponent corruption, when it occurs, remains the catastrophic failure mode because it corrupts an entire block simultaneously. These results show that MX format encoding and internal observability are both security-relevant design choices for edge-AI inference hardware. 
    more » « less
  3. Electromagnetic fault injection (EMFI) induces transient faults in SoCs, yet limited hardware observability impedes root-cause analysis. We present a scan-chain-assisted EMFI framework on CAPRI1, a custom RISC-V SoC whose 12 756 scan-accessible flip-flops provide cycle-resolved, bit-level visibility after each injection on fabricated silicon. Unlike simulation based fault injection with predetermined fault models, post silicon scan capture reveals actual physical fault manifestations, including spatially correlated multi-bit upsets and PDN-mediated coupling effects. We apply hierarchical fault analysis to two PIN verification firmware workloads, tracing fault propagation from initial bit-flips through microarchitectural blocks to software level outcomes. A direction-aware PDN loop coupling model predicts EMFI-susceptible die regions from pre-silicon layout data; the predicted map aligns with scan-localized first-flip clustering (70.6% within the top-20% predicted region, 𝑝 < 10−5), establishing a pre-silicon-to-post-silicon vulnerability assessment workflow. 
    more » « less
  4. While many software vulnerabilities are blamed on software bugs, they can also be caused by hardware fault injection. Traditional fault injection methods rely on blind attacks based on simplified fault models, such as instruction skipping. These attacks require exhaustive experimentation across a wide range of fault parameters, with the methodology inferred solely from faulty outcomes, resulting in limited insight into fault impact and an overall inefficient approach. We present GLITCHGLÜCK, a novel approach that combines a tool for simulating hardware-software interactions with a methodology for guiding fault injection. The tool observes the system via scan-chain-accessible states and constructs the Dynamic State Transition Graph (DSTG), a temporal representation of how software instructions trigger interactions with hardware components. By analyzing the DSTG, GLITCHGLÜCK pinpoints fault injection parameters – such as when, where, and what to fault without relying on predefined fault models – thus avoiding the need for an exhaustive fault parameter search. This targeted, data-driven method bridges the gap between simulation and physical fault observation by using scan-chain. GLITCHGLÜCK is demonstrated on a physical OpenMSP430 ASIC chip with scan-chain support, and validated in simulation on PicoRV32 (RV32I) and IBEX (RV32IM) to confirm its applicability across different instruction set architectures and microarchitectures. We assess the effectiveness of several software countermeasures, such as instruction duplication and pin verification, using layout-aware fault simulations to guide fault attacks via clock glitching and laser-induced faults. 
    more » « less
  5. Hardware Fault injection can leave processors in weird micro-architecture states that evade detection by conventional software-level monitors, jeopardizing system reliability. We propose μScan, a deep learning framework that leverages scan-chain observability to detect such anomalous states. μScan fine-tunes a large language model (LLM) to clas-sify single-cycle processor states as weird or sane, achieving an average clas-sification accuracy of 92% and maintaining robust performance across dif-ferent CPU architectures (MSP430, PICO (RV32IC), IBEX (RV32IMC)). We further refine this LLM classifier with reinforcement learning, which sharpens its decision boundaries and improves detection of borderline anomalies. μScan also employs a graph neural network (GNN) to ana-lyze multi-cycle fault patterns, capturing complex temporal dependencies that single-cycle analysis might miss. This GNN-based analysis success-fully identifies recurring fault sequences and maps them to known Common Weakness Enumeration (CWE) vulnerability classes, revealing potential hardware design flaws. μScan demonstrates scalability and generalization on multiple processor architectures (including micro-coded and pipelined cores) and is evaluated with both pre-silicon simulation data and a post-silicon prototype. Our results show that μScan enables early detection of micro-architecture vulnerabilities in the design phase and provides a robust post-silicon anomaly detection mechanism. 
    more » « less
  6. While traditional side-channel leakage assessment relies on physical measurements from fabricated hardware, pre-silicon leakage assessment – conducted through simulation – faces the challenge of estimating real-world leakage before hardware is available. Current pre-silicon methods seek to improve simulation accuracy by increasing the design detail at a single level of abstraction. However, there is always a limit to the level of detail that can be incorporated, leaving a gap between simulation and physical measurements. We show that side-channel leakage analysis can be approached as a top-down problem, using a hierarchical assessment across multiple abstraction levels in a design, enabling to gradually uncover sources of side-channel leakage. We introduce Telescope, a topdown framework for pre-silicon side-channel leakage assessment across three abstraction levels: architecture-, micro-architecture-, and gate-level. Telescope’s primary contribution is to demonstrate the propagation of leakage from software instructions to hardware implementation, revealing that leakage does not manifest uniformly across design levels. Through this cross-level analysis, we identify the origins of leakage at each abstraction level and explore their interdependencies. This analysis uncovers three distinct leakage patterns across levels: progressive, emergent, and disappearing sidechannel leakage. We demonstrate the scalability of Telescope by applying it to two RISC-V processors: the multi-cycle PicoRV32 and the pipelined IBEX. We analyze their power-based side-channel leakage patterns across several benchmarks, and illustrate practical use cases in debugging of side-channel leakage issues, and verification of micro-architecture effects in side-channel leakage. 
    more » « less
  7. Hardware faults are a known source of security vulnerabilities. Fault injection in secure embedded systems leads to information leakage and privilege escalation, and countless fault attacks have been demonstrated both in simulation and in practice. However, there is a significant gap between simulated fault attacks and physical fault attacks. Simulations use idealized fault models such as single-bit flips with uniform distribution. These ideal fault models may not hold in practice. On the other hand, practical experiments lack the white-box visibility necessary to determine the true nature of the fault, leading to probabilistic vulnerability assessments and unexplained results. In embedded software, this problem is further exacerbated by the layered abstractions between the hardware (where the fault originates) and the application software (where the fault effect is observed). We present FaultDetective, a method to investigate the root-cause of fault injection from fault detection in software. Our main insight is that fault detection in software is only the end-point of a chain of events that starts with a fault manifestation in hardware and propagates through the micro-architecture and architecture before reaching the software level. To understand the fault effects at the hardware level, we use a scan chain, a low-level hardware test structure. We then use white-box simulation to propagate and observe hardware faults in the embedded software. We efficiently visualize the fault propagation across abstraction levels using a hash-tree representation of the scan chain. We implement this concept in a multi-core MSP430 micro-controller that redundantly executes an application in lock-step. With this setup, we observe the fault effects for several different stressors, including clock glitching and thermal laser stimulation, and explain the root-cause in each case. 
    more » « less
  8. Guo, J; Steinfeld, R (Ed.)
    Active fault injection is a credible threat to real-world digital systems computing on sensitive data. Arguing about security in the presence of faults is non-trivial, and state-of-the-art criteria are overly conservative and lack the ability of fine-grained comparison. However, comparing two alternative implementations for their security is required to find a satisfying compromise between security and performance. In addition, the comparison of alternative fault scenarios can help optimize the implementation of effective countermeasures. In this work, we use quantitative information flow analysis to establish a vulnerability metric for hardware circuits under fault injection that measures the severity of an attack in terms of information leakage. Potential use cases range from comparing implementations with respect to their vulnerability to specific fault scenarios to optimizing countermeasures. We automate the computation of our metric by integrating it into a state-of-the-art evaluation tool for physical attacks and provide new insights into the security under an active fault attacker. 
    more » « less