Deep neural networks are lucrative targets of adversarial attacks and approximate deep neural networks (AxDNNs) are no exception. Searching manually for adversarially robust AxDNN architectures incurs outrageous time and human effort. In this paper, we propose XAI-NAS, an explainable neural architecture search (NAS) method that leverages explainable artificial intelligence (XAI) to efficiently co-optimize the adversarial robustness and hardware efficiency of AxDNN architectures on systolic-array hardware accelerators. During the NAS process, AxDNN architectures are evolved layer-wise with heterogeneous approximate multipliers to deliver the best trade-offs between adversarial robustness, energy consumption, latency, and memory footprint. The most suitable approximate multipliers are automatically selected from an open-source Evoapprox8b library. Our extensive evaluations provide a set of Pareto optimal hardware efficient and adversarially robust solutions. For example, a Pareto-optimal DNN AxDNN for the MNIST and CIFAR-10 datasets exhibits up to 1.5× higher adversarial robustness, 2.1× less energy consumption, 4.39× reduced latency, and 2.37× low memory footprint when compared to the state-of-the-art NAS approaches.
more »
« less
This content will become publicly available on April 23, 2026
Explainable AI-Guided Efficient Approximate DNN Generation for Multi-Pod Systolic Arrays
Approximate deep neural networks (AxDNNs) are promising for enhancing energy efficiency in real-world devices. One of the key contributors behind this enhanced energy efficiency in AxDNN is the use of approximate multipliers. Unfortunately, the simulation of approximate multipliers does not usually scale well on CPUs and GPUs. As a consequence, this slows down the overall simulation of AxDNNs aimed at identifying the appropriate approximate multipliers to achieve high energy efficiency with a minimum accuracy loss. To address this problem, we present a novel XAI-Gen methodology, which leverages the analytical model of the emerging hardware accelerator (e.g., Google TPU v4) and explainable artificial intelligence (XAI) to precisely identify the non-critical layers for approximation and quickly discover the appropriate approximate multipliers for AxDNN layers. Our results show that XAI-Gen achieves up to 7× lower energy consumption with only 1-2% accuracy loss. We also showcase the effectiveness of the XAI-Gen approach through a neural architecture search (XAI-NAS) case study. Interestingly, XAI-NAS achieves 40% higher energy efficiency with up to 5× less execution time when compared to the state-of-the-art NAS methods for generating AxDNNs.
more »
« less
- Award ID(s):
- 2323819
- PAR ID:
- 10618281
- Publisher / Repository:
- IEEE
- Date Published:
- ISBN:
- 979-8-3315-0942-2
- Page Range / eLocation ID:
- 1 to 8
- Format(s):
- Medium: X
- Location:
- San Francisco, CA, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Low-latency and low-power edge AI is crucial for Virtual Reality and Augmented Reality applications. Recent advances demonstrate that hybrid models, combining convolution layers (CNN) and transformers (ViT), often achieve a superior accuracy/performance tradeoff on various computer vision and machine learning (ML) tasks. However, hybrid ML models can present system challenges for latency and energy efficiency due to their diverse nature in dataflow and memory access patterns. In this work, we leverage architecture heterogeneity from Neural Processing Units (NPU) and Compute-In-Memory (CIM) and explore diverse execution schemas to efficiently execute these hybrid models. We introduce H4H-NAS, a two-stage Neural Architecture Search (NAS) framework to automate the design of efficient hybrid CNN/ViT models for heterogeneous edge systems featuring both NPU and CIM. We propose a two-phase incremental supernet training in our NAS framework to resolve gradient conflicts between sampled subnets caused by different types of blocks in a hybrid model search space. Our H4H-NAS approach is also powered by a performance estimator built with NPU performance results measured on real silicon, and CIM performance based on industry IPs. H4H-NAS searches hybrid CNN-ViT models with fine granularity and achieves significant (up to 1.34%) top-1 accuracy improvement on ImageNet. Moreover, results from our algorithm/hardware co-design reveal up to 56.08% overall latency and 41.72% energy improvements by introducing heterogeneous computing over baseline solutions. Overall, our framework guides the design of hybrid network architectures and system architectures for NPU+CIM heterogeneous systems.more » « less
-
In this work, we employ neural architecture search (NAS) to enhance the efficiency of deploying diverse machine learning (ML) tasks on in-memory computing (IMC) architectures. Initially, we design three fundamental components inspired by the convolutional layers found in VGG and ResNet models. Subsequently, we utilize Bayesian optimization to construct a convolutional neural network (CNN) model with adaptable depths, employing these components. Through the Bayesian search algorithm, we explore a vast search space comprising over 640 million network configurations to identify the optimal solution, considering various multi-objective cost functions like accuracy/latency and accuracy/energy. Our evaluation of this NAS approach for IMC architecture deployment spans three distinct image classification datasets, demonstrating the effectiveness of our method in achieving a balanced solution characterized by high accuracy and reduced latency and energy consumption.more » « less
-
The ability to automatically generate a neural network architecture and the corresponding hardware implementation to optimize both accuracy and performance characteristics (latency, power) simultaneously for edge-based Artificial Intelligence (AI) applications is becoming prevalent. As both neural architecture search (NAS) and hardware implementation have ample design space, it is very challenging to integrate with resource-constrained edge computing hardware since the current co-search frameworks take several hundreds of GPU hours to converge. In this paper, we propose SCORCH, a novel neural architecture search and hardware accelerator co-design framework with reinforcement learning to maximize accuracy, and increase energy efficiency and throughput while converging faster. By predicting hyperparameters of neural networks together with hardware resources, we use a reinforcement-based multi-phased controller to explore neural architecture to achieve higher accuracy and hardware performance simultaneously by applying customized dataflows, voltage/frequency scaling, and tunable Network-on-Chip (NoC) hardware parameters. Our simulation results on the CIFAR-10/100 and ImageNet datasets show that SCORCH achieves identical neural network accuracy while achieving 2.6% higher accuracy, and 35.6%, 26.2%, and 65.8% reductions in latency, energy, and area compared with state-of-art co-search frameworks such as DANCE, NANDS, and NASAIC.more » « less
-
Approximate computing is a promising way to improve the power efficiency of deep learning. While recent work proposes new arithmetic circuits (adders and multipliers) that consume substantially less power at the cost of computation errors, these approximate circuits decrease the end-to-end accuracy of common models. We present AutoApprox, a framework to automatically generate approximate low-power deep learning accelerators without any accuracy loss. AutoApprox generates a wide range of approximate ASIC accelerators with a TPUv3 systolic-array template. AutoApprox uses a learned router to assign each DNN layer to an approximate systolic array from a bank of arrays with varying approximation levels. By tailoring this routing for a specific neural network architecture, we discover circuit designs without the accuracy penalty from prior methods. Moreover, AutoApprox optimizes for the end-to-end performance, power and area of the the whole chip and PE mapping rather than simply measuring the performance of the arithmetic units in iso-lation. To our knowledge, our work is the first to demonstrate the effectiveness of custom-tailored approximate circuits in delivering significant chip-level energy savings with zero accuracy loss on a large-scale dataset such as ImageNet. AutoApprox synthesizes a novel approximate accelerator based on the TPU that reduces end-to-end power consumption by 3.2% and area by 5.2% at a sub-10nm process with no degradation in ImageNet validation top-1 and top-5 accuracy.more » « less
An official website of the United States government
