skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2402940

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Convolutional neural networks (CNN) are incorporated into many image-based tasks across a variety of domains. Some of these are safety critical tasks such as object classification/detection and lane detection for self-driving cars. These applications have strict safety requirements and must guarantee the reliable operation of the neural networks in the presence of soft errors (i.e., transient faults) in DRAM. Standard safety mechanisms (e.g., triplication of data/computation) provide high resilience, but introduce intolerable overhead. We perform detailed characterization and propose an efficient methodology for pinpointing critical weights by using an efficient proxy, the Taylor criterion. Using this characterization, we design Aspis, an efficient software protection scheme that does selective weight hardening and offers a performance/reliability tradeoff. Aspis provides higher resilience comparing to state-of-the-art methods and is integrated into PyTorch as a fully-automated library. 
    more » « less
  2. Graphics Processing Units (GPUs) are widely de-ployed and utilized across various computing domains including cloud and high-performance computing. Considering its extensive usage and increasing popularity, ensuring GPU reliability is cru-cial. Software-based reliability evaluation methodologies, though fast, often neglect the complex hardware details of modern GPU designs. This oversight could lead to misleading measurements and misguided decisions regarding protection strategies. This paper breaks new ground by conducting an in-depth examination of well-established vulnerability assessment methods for modern GPU architectures, from the microarchitecture all the way to the software layers. It highlights divergences between popular software-based vulnerability evaluation methods and the ground truth cross-layer evaluation, which persist even under strong protections like triple modular redundancy. Accurate evaluation requires considering fault distribution from hardware to software. Our comprehensive measurements offer valuable insights into the accurate assessment of GPU reliability. 
    more » « less