EdgeGuard: Robust and Fault-Aware Design for Resilient Edge Computing AI Accelerators

Ahmed, Sabrina; Hoque, Khaza Anuarul; Carrion_Schafer, Benjamin

doi:10.1145/3716368.3735153

Citation Details

This content will become publicly available on June 29, 2026

EdgeGuard: Robust and Fault-Aware Design for Resilient Edge Computing AI Accelerators

Smaller transistor feature sizes have made integrated circuits (ICs) more vulnerable to permanent faults. This leads to short lifetimes and increased risk of faults that lead to catastrophic errors. Fortunately, Artificial Neural Networks (ANNs) are error resilient as their accuracies can be maintained through e.g., fault-aware re-training. One of the problems though with previous work is that they require a re-design in the individual neuron processing element structure in order to efficiently deal with these faults. In this work, we propose a novel architecture combined with a design flow that performs a fault-aware weight re-assignment in order to minimize the effect of permanent faults on the accuracy of ANNs mapped to AI accelerator without the need of time-consuming fault-aware re-training nor neuron processing elements re-design. In particular, we deal with Tensor Processing Units (TPUs) although our proposed approach is also extensible to any other architecture. Experimental results show that our proposed approach and can be efficiently executed on a fast dedicated hardware re-binding unit or on software. more »

Award ID(s):: 2323819

PAR ID:: 10618287

Author(s) / Creator(s):: Ahmed, Sabrina; Hoque, Khaza Anuarul; Carrion_Schafer, Benjamin

Publisher / Repository:: ACM

Date Published:: 2025-06-29

ISBN:: 9798400714962

Page Range / eLocation ID:: 134 to 140

Format(s):: Medium: X

Location:: New Orleans LA USA

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on June 29, 2026
Conference Paper:
https://doi.org/10.1145/3716368.3735153

More Like this