For real-time computing systems, energy efficiency, Quality of Service, and fault tolerance are among the major design concerns. In this work, we study the problem of reliable and energy-aware fixed-priority (m,k)-deadlines enforcement with standby-sparing. The standby-sparing systems adopt a primary processor and a spare processor to provide fault tolerance for both permanent and transient faults. In order to reduce energy consumption for such kind of systems, we proposed a novel scheduling scheme under the QoS constraint of (m,k)- deadlines. The evaluation results demonstrate that our proposed approach significantly outperformed the previous research in energy conservation while assuring (m,k)-deadlines and fault tolerance for real-time systems.
more »
« less
This content will become publicly available on June 29, 2026
EdgeGuard: Robust and Fault-Aware Design for Resilient Edge Computing AI Accelerators
Smaller transistor feature sizes have made integrated circuits (ICs) more vulnerable to permanent faults. This leads to short lifetimes and increased risk of faults that lead to catastrophic errors. Fortunately, Artificial Neural Networks (ANNs) are error resilient as their accuracies can be maintained through e.g., fault-aware re-training. One of the problems though with previous work is that they require a re-design in the individual neuron processing element structure in order to efficiently deal with these faults. In this work, we propose a novel architecture combined with a design flow that performs a fault-aware weight re-assignment in order to minimize the effect of permanent faults on the accuracy of ANNs mapped to AI accelerator without the need of time-consuming fault-aware re-training nor neuron processing elements re-design. In particular, we deal with Tensor Processing Units (TPUs) although our proposed approach is also extensible to any other architecture. Experimental results show that our proposed approach and can be efficiently executed on a fast dedicated hardware re-binding unit or on software.
more »
« less
- Award ID(s):
- 2323819
- PAR ID:
- 10618287
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9798400714962
- Page Range / eLocation ID:
- 134 to 140
- Format(s):
- Medium: X
- Location:
- New Orleans LA USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
For real-time computing systems, energy efficiency, Quality of Service, and fault tolerance are among the major design concerns. In this work, we study the problem of reliable and energy-aware fixed-priority (m,k)-deadlines enforcement with standby-sparing. The standby-sparing systems adopt a primary processor and a spare processor to provide fault tolerance for both permanent and transient faults. In order to reduce energy consumption for such kind of systems, we proposed a novel scheduling scheme under the QoS constraint of (m,k)- deadlines. The evaluation results demonstrate that our proposed approach significantly outperformed the previous research in energy conservation while assuring (m,k)-deadlines and fault tolerance for real-time systems.more » « less
-
Abstract Neuromorphic computing mimics the organizational principles of the brain in its quest to replicate the brain’s intellectual abilities. An impressive ability of the brain is its adaptive intelligence, which allows the brain to regulate its functions “on the fly” to cope with myriad and ever-changing situations. In particular, the brain displays three adaptive and advanced intelligence abilities of context-awareness, cross frequency coupling, and feature binding. To mimic these adaptive cognitive abilities, we design and simulate a novel, hardware-based adaptive oscillatory neuron using a lattice of magnetic skyrmions. Charge current fed to the neuron reconfigures the skyrmion lattice, thereby modulating the neuron’s state, its dynamics and its transfer function “on the fly.” This adaptive neuron is used to demonstrate the three cognitive abilities, of which context-awareness and cross-frequency coupling have not been previously realized in hardware neurons. Additionally, the neuron is used to construct an adaptive artificial neural network (ANN) and perform context-aware diagnosis of breast cancer. Simulations show that the adaptive ANN diagnoses cancer with higher accuracy while learning faster and using a more compact and energy-efficient network than a nonadaptive ANN. The work further describes how hardware-based adaptive neurons can mitigate several critical challenges facing contemporary ANNs. Modern ANNs require large amounts of training data, energy, and chip area, and are highly task-specific; conversely, hardware-based ANNs built with adaptive neurons show faster learning, compact architectures, energy-efficiency, fault-tolerance, and can lead to the realization of broader artificial intelligence.more » « less
-
While energy consumption is the primary concern for the design of real-time embedded systems, fault-tolerance and quality of service (QoS) are becoming increasingly important in the development of today’s pervasive computing systems. In this work, we study the problem of energy-aware standby-sparing for weakly hard real-time embedded systems. The standby-sparing systems adopt a primary processor and a spare processor to provide fault tolerance for both permanent and transient faults. In order to reduce energy consumption for such kind of systems, we proposed two novel scheduling schemes: one for (1,1)-hard tasks and one for general (m,k)-hard tasks which require that at least m out of any k consecutive jobs of a task meet their deadlines. Through extensive evaluations, our results demonstrate that the proposed techniques significantly outperform the previous research in reducing energy consumption for both (1,1)-hard task sets and general (m,k)-hard task sets while assuring fault tolerance through standby-sparing.more » « less
-
Processing-in-memory (PIM) based architecture shows great potential to process several emerging artificial intelligence workloads, including vision and language models. Cross-layer optimizations could bridge the gap between computing density and the available resources by reducing the computation and memory cost of the model and improving the model’s robustness against non-ideal hardware effects. We first introduce several hardware-aware training methods to improve the model robustness to the PIM device’s nonideal effects, including stuck-at-fault, process variation, and thermal noise. Then, we further demonstrate a software/hardware (SW/HW) co-design methodology to efficiently process the state-of-the-art attention-based model on PIM-based architecture by performing sparsity exploration for the attention-based model and circuit architecture co-design to support the sparse processing.more » « less
An official website of the United States government
