skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2321224

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Vision Transformers (ViTs) have evolved in the field of computer vision by transitioning traditional Convolutional Neural Networks (CNNs) into attention-based architectures. This architecture processes input images as sequences of patches. ViTs achieve enhanced performance in many tasks such as image classification and object detection due to their ability to capture global dependencies within input data. While their software implementations are widely adopted, deploying ViTs on hardware introduces several challenges. These include fault tolerance in the presence of hardware failures, real-time reliability, and high computational requirements. Permanent faults that are in processing elements, interconnections, or memory subsystems lead to incorrect computations and degrading system performance. This paper proposes a fault-tolerant hardware implementation of ViTs to overcome these challenges. This hardware implementation integrates real-time fault detection and recovery mechanisms. The architecture includes four primary units: patch embedding, encoder, decoder, and Multi Layer Perceptron (MLP) which are supported by fault-tolerant components such as lightweight recompute units, a centralized Built-In Self-Test (BIST), and a learning-based decision-making system using machine learning model 'decision tree'. These units are interconnected through a centralized global buffer for efficient data transfer, ensuring seamless operation even under fault conditions. 
    more » « less
    Free, publicly-accessible full text available June 25, 2026
  2. Free, publicly-accessible full text available March 31, 2026
  3. The growing computational demands of deep learning have driven interest in analog neural networks using resistive memory and silicon photonics. However, these technologies face inherent limitations in computing parallelism when used independently. Photonic phase-change memory (PCM), which integrates photonics with PCM, overcomes these constraints by enabling simultaneous processing of multiple inputs encoded on different wavelengths, significantly enhancing parallel computation for deep neural network (DNN) inference and training. This paper presents MERIT, a sustainable DNN accelerator that capitalizes on the non-volatility of resistive memory and the high operating speed of photonic devices. MERIT enables seamless inference and training by loading weight kernels into photonic PCM arrays and selectively supplying light encoded with input features for the forward pass and loss gradients for the backward pass. We compare MERIT with state-of-the-art digital and analog DNN accelerators including TPU, DEAP, and PTC. Simulation results demonstrate that MERIT reduces execution time by 68% and energy consumption by 64% for inference, and reduces execution time by 79% and energy consumption by 84% for training. 
    more » « less
  4. As the size of real-world graphs continues to grow at an exponential rate, performing the Graph Convolutional Network (GCN) inference efficiently is becoming increasingly challenging. Prior works that employ a unified computing engine with a predefined computation order lack the necessary flexibility and scalability to handle diverse input graph datasets. In this paper, we introduce OPT-GCN, a chiplet-based accelerator design that performs GCN inference efficiently while providing flexibility and scalability through an architecture-algorithm co-design. On the architecture side, the proposed design integrates a unified computing engine in each chiplet and an active interposer, both of which are adaptable to efficiently perform the GCN inference and facilitate data communication. On the algorithm side, we propose dynamic scheduling and mapping algorithms to optimize memory access and on-chip computations for diverse GCN applications. Experimental results show that the proposed design provides a memory access reduction by a factor of 11.3×, 3.4×, 1.4× energy savings of 15.2×, 3.7×, 1.6× on average compared to HyGCN, AWB-GCN, and GCNAX, respectively. 
    more » « less