Improving Fault Tolerance for FPGA SoCs through Post-Radiation Design Analysis

Wilson, Andrew Elbert; Baker, Nathan; Campbell, Ethan; Wirthlin, Michael

doi:10.1145/3674841

Citation Details

Improving Fault Tolerance for FPGA SoCs through Post-Radiation Design Analysis

FPGAs have been shown to operate reliably within harsh radiation environments by employing single-event upset (SEU) mitigation techniques, such as configuration scrubbing, triple-modular redundancy, error correction coding, and radiation aware implementation techniques. The effectiveness of these techniques, however, is limited when using complex system-level designs that employ complex I/O interfaces with single-point failures. In previous work, a complex SoC system running Linux applied several of these techniques only to obtain an improvement of 14\(\times\)in mean time to failure (MTTF). A detailed post-radiation fault analysis found that the limitations in reliability were due to the DDR interface, the global clock network, and interconnect. This article applied a number of design-specific SEU mitigation techniques to address the limitations in reliability of this design. These changes include triplicating the global clock, optimizing the placement of the reduction output voters and input flip-flops, and employing a mapping technique called “striping.” The application of these techniques improved MTTF of the mitigated design by a factor of 1.54\(\times\)and thus provides a 22.8X\(\times\)MTTF improvement over the unmitigated design. A post-radiation fault analysis using BFAT was also performed to find the remaining design vulnerabilities. more »

Award ID(s):: 1738550

PAR ID:: 10548179

Author(s) / Creator(s):: Wilson, Andrew Elbert; Baker, Nathan; Campbell, Ethan; Wirthlin, Michael

Publisher / Repository:: ACM

Date Published:: 2024-09-30

Journal Name:: ACM Transactions on Reconfigurable Technology and Systems

Volume:: 17

Issue:: 3

ISSN:: 1936-7406

Page Range / eLocation ID:: 1 to 21

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Journal Article:
https://doi.org/10.1145/3674841

More Like this