skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on June 25, 2026

Title: Decision-driven fault-tolerant architecture for vision transformers with real-time error mitigation
Vision Transformers (ViTs) have evolved in the field of computer vision by transitioning traditional Convolutional Neural Networks (CNNs) into attention-based architectures. This architecture processes input images as sequences of patches. ViTs achieve enhanced performance in many tasks such as image classification and object detection due to their ability to capture global dependencies within input data. While their software implementations are widely adopted, deploying ViTs on hardware introduces several challenges. These include fault tolerance in the presence of hardware failures, real-time reliability, and high computational requirements. Permanent faults that are in processing elements, interconnections, or memory subsystems lead to incorrect computations and degrading system performance. This paper proposes a fault-tolerant hardware implementation of ViTs to overcome these challenges. This hardware implementation integrates real-time fault detection and recovery mechanisms. The architecture includes four primary units: patch embedding, encoder, decoder, and Multi Layer Perceptron (MLP) which are supported by fault-tolerant components such as lightweight recompute units, a centralized Built-In Self-Test (BIST), and a learning-based decision-making system using machine learning model 'decision tree'. These units are interconnected through a centralized global buffer for efficient data transfer, ensuring seamless operation even under fault conditions.  more » « less
Award ID(s):
2321225 2321224
PAR ID:
10636720
Author(s) / Creator(s):
; ;
Publisher / Repository:
International Telecommunication Union
Date Published:
Journal Name:
ITU Journal on Future and Evolving Technologies
Volume:
6
Issue:
2
ISSN:
2616-8375
Page Range / eLocation ID:
198 to 214
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. As edge computing and sensing devices continue to proliferate, distributed machine learning (ML) inference pipelines are becoming popular for enabling low-latency, real-time decision-making at scale. However, the geographically dispersed and often resource-constrained nature of edge devices makes them susceptible to various failures, such as hardware malfunctions, network disruptions, and device overloading. These edge failures can significantly affect the performance and availability of inference pipelines and the sensing-to-decision-making loops they enable. In addition, the complexity of task dependencies amplifies the difficulty of maintaining performant and reliable ML operations. To address these challenges and minimize the impact of edge failures on inference pipelines, this paper presents several fault-tolerant approaches, including sensing redundancy, structural resilience, failover replication, and pipeline reconfiguration. For each approach, we explain the key techniques and highlight their effectiveness and tradeoffs. Finally, we discuss the challenges associated with these approaches and outline future directions. 
    more » « less
  2. While Vision Transformers (ViTs) have shown consistent progress in computer vision, deploying them for real-time decision-making scenarios (< 1 ms) is challenging. Current computing platforms like CPUs, GPUs, or FPGA-based solutions struggle to meet this deterministic low-latency real-time requirement, even with quantized ViT models. Some approaches use pruning or sparsity to reduce model size and latency, but this often results in accuracy loss. To address the aforementioned constraints, in this work, we propose EQ-ViT, an end-to-end acceleration framework with novel algorithm and architecture co-design features to enable real-time ViT acceleration on AMD Versal Adaptive Compute Acceleration Platform (ACAP). The contributions are four-fold. First, we perform in-depth kernel- level performance profiling & analysis and explain the bottlenecks for existing acceleration solutions on GPU, FPGA, and ACAP. Second, on the hardware level, we introduce a new spatial and heterogeneous accelerator architecture, EQ-ViT architec- ture. This architecture leverages the heterogeneous features of ACAP, where both FPGA and artificial intelligence engines (AIEs) coexist on the same system-on-chip (SoC). Third, On the algorithm level, we create a comprehensive quantization-aware training strategy, EQ-ViT algorithm. This strategy concurrently quantizes both weights and activations into 8-bit integers, aiming to improve accuracy rather than compromise it during quanti- zation. Notably, the method also quantizes nonlinear functions for efficient hardware implementation. Fourth, we design EQ- ViT automation framework to implement the EQ-ViT architec- ture for four different ViT applications on the AMD Versal ACAP VCK190 board, achieving accuracy improvement with 2.4%, and average speedups of 315.0x, 3.39x, 3.38x, 14.92x, 59.5x, 13.1x over computing solutions of Intel Xeon 8375C vCPU, Nvidia A10G, A100, Jetson AGX Orin GPUs, and AMD ZCU102, U250 FPGAs. The energy efficiency gains are 62.2x, 15.33x, 12.82x, 13.31x, 13.5x, 21.9x. 
    more » « less
  3. Vision transformers (ViTs) have recently set off a new wave in neural architecture design thanks to their record-breaking performance in various vision tasks. In parallel, to fulfill the goal of deploying ViTs into real-world vision applications, their robustness against potential malicious attacks has gained increasing attention. In particular, recent works show that ViTs are more robust against adversarial attacks as compared with convolutional neural networks (CNNs), and conjecture that this is because ViTs focus more on capturing global interactions among different input/feature patches, leading to their improved robustness to local perturbations imposed by adversarial attacks. In this work, we ask an intriguing question: “Under what kinds of perturbations do ViTs become more vulnerable learners compared to CNNs?” Driven by this question, we first conduct a comprehensive experiment regarding the robustness of both ViTs and CNNs under various existing adversarial attacks to understand the underlying reason favoring their robustness. Based on the drawn insights, we then propose a dedicated attack framework, dubbed Patch-Fool, that fools the self-attention mechanism by attacking its basic component (i.e., a single patch) with a series of attention-aware optimization techniques. Interestingly, our Patch-Fool framework shows for the first time that ViTs are not necessarily more robust than CNNs against adversarial perturbations. In particular, we find that ViTs are more vulnerable learners compared with CNNs against our Patch-Fool attack which is consistent across extensive experiments, and the observations from Sparse/Mild Patch-Fool, two variants of Patch-Fool, indicate an intriguing insight that the perturbation density and strength on each patch seem to be the key factors that influence the robustness ranking between ViTs and CNNs. It can be expected that our Patch-Fool framework will shed light on both future architecture designs and training schemes for robustifying ViTs towards their real-world deployment. Our codes are available at https://github.com/RICE-EIC/Patch-Fool. 
    more » « less
  4. Ensuring high-quality prints in additive manufacturing is a critical challenge due to the variability in materials, process parameters, and equipment. Machine learning models are increasingly being employed for real-time quality monitoring, enabling the detection and classification of defects such as under-extrusion and over-extrusion. Vision Transformers (ViTs), with their global self-attention mechanisms, offer a promising alternative to traditional convolutional neural networks (CNNs). This paper presents a transformer-based approach for print quality recognition in additive manufacturing technologies, with a focus on fused filament fabrication (FFF), leveraging advanced self-supervised representation learning techniques to enhance the robustness and generalizability of ViTs. We show that the ViT model effectively classifies printing quality into different levels of extrusion, achieving exceptional performance across varying dataset scales and noise levels. Training evaluations show a steady decrease in cross-entropy loss, with prediction accuracy, precision, recall, and the harmonic mean of precision and recall (F1) scores reaching close to 1 within 40 epochs, demonstrating excellent performance across all classes. The macro and micro F1 scores further emphasize the ability of ViT to handle both class imbalance and instance-level accuracy effectively. Our results also demonstrate that ViT outperforms CNN in all scenarios, particularly in noisy conditions and with small datasets. Comparative analysis reveals ViT advantages, particularly in leveraging global self-attention and robust feature extraction methods, enhancing its ability to generalize effectively and remain resilient with limited data. These findings underline the potential of the transformer-based approach as a scalable, interpretable, and reliable solution to real-time quality monitoring in FFF, addressing key challenges in additive manufacturing defect detection and ensuring process efficiency. 
    more » « less
  5. This work is an experience with a deployed networked system for digital agriculture (or DA). Digital agriculture is the use of data-driven techniques towards a sustainable increase in farm productivity and efficiency. DA systems are expected to be overlaid on existing rural infrastructures, which are known to be less robust. While existing DA approaches partially address several infrastructure issues, challenges related to data aggregation, data analytics, and fault tolerance remain open. In this work, we present the design of Comosum, an extensible, reconfigurable, and fault-tolerant architecture of hardware, software, and distributed cloud abstractions to sense, analyze, and actuate on different farm types. FarmBIOS is an implementation of the Comosum architecture. We analyze FarmBIOS by leveraging various applications, deployment experiences, and network differences between urban and rural farms. This includes, for instance, an edge analytics application achieving 86% accuracy in vineyard disease detection. An eighteen-month deployment of FarmBIOS highlights Comosum’s fault tolerance. It was fault tolerant to intermittent network outages that lasted for several days during many periods of the deployment. We introduce active digital twins to cope with the unreliability of the underlying base systems. 
    more » « less