skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: FLASH: F ast Neura l A rchitecture S earch with H ardware Optimization
Neural architecture search (NAS) is a promising technique to design efficient and high-performance deep neural networks (DNNs). As the performance requirements of ML applications grow continuously, the hardware accelerators start playing a central role in DNN design. This trend makes NAS even more complicated and time-consuming for most real applications. This paper proposes FLASH, a very fast NAS methodology that co-optimizes the DNN accuracy and performance on a real hardware platform. As the main theoretical contribution, we first propose the NN-Degree, an analytical metric to quantify the topological characteristics of DNNs with skip connections (e.g., DenseNets, ResNets, Wide-ResNets, and MobileNets). The newly proposed NN-Degree allows us to do training-free NAS within one second and build an accuracy predictor by training as few as 25 samples out of a vast search space with more than 63 billion configurations. Second, by performing inference on the target hardware, we fine-tune and validate our analytical models to estimate the latency, area, and energy consumption of various DNN architectures while executing standard ML datasets. Third, we construct a hierarchical algorithm based on simplicial homology global optimization (SHGO) to optimize the model-architecture co-design process, while considering the area, latency, and energy consumption of the target hardware. We demonstrate that, compared to the state-of-the-art NAS approaches, our proposed hierarchical SHGO-based algorithm enables more than four orders of magnitude speedup (specifically, the execution time of the proposed algorithm is about 0.1 seconds). Finally, our experimental evaluations show that FLASH is easily transferable to different hardware architectures, thus enabling us to do NAS on a Raspberry Pi-3B processor in less than 3 seconds.  more » « less
Award ID(s):
2007284
PAR ID:
10380968
Author(s) / Creator(s):
; ; ;
Date Published:
Journal Name:
ACM Transactions on Embedded Computing Systems
Volume:
20
Issue:
5s
ISSN:
1539-9087
Page Range / eLocation ID:
1 to 26
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Sparse deep neural networks (DNNs) have the potential to deliver compelling performance and energy efficiency without significant accuracy loss. However, their benefits can quickly diminish if their training is oblivious to the target hardware. For example, fewer critical connections can have a significant overhead if they translate into long-distance communication on the target hardware. Therefore, hardware-aware sparse training is needed to leverage the full potential of sparse DNNs. To this end, we propose a novel and comprehensive communication-aware sparse DNN optimization framework for tile-based in-memory computing (IMC) architectures. The proposed technique, CANNON first maps the DNN layers onto the tiles of the target architecture. Then, it replaces the fully connected and convolutional layers with communication-aware sparse connections. After that, CANNON optimizes the communication cost with minimal impact on the DNN accuracy. Extensive experimental evaluations with a wide range of DNNs and datasets show up to 3.0× lower communication energy, 3.1× lower communication latency, and 6.8× lower energy-delay product compared to state-of-the-art pruning approaches with a negligible impact on the classification accuracy on IMC-based machine learning accelerators. 
    more » « less
  2. Deep neural networks are lucrative targets of adversarial attacks and approximate deep neural networks (AxDNNs) are no exception. Searching manually for adversarially robust AxDNN architectures incurs outrageous time and human effort. In this paper, we propose XAI-NAS, an explainable neural architecture search (NAS) method that leverages explainable artificial intelligence (XAI) to efficiently co-optimize the adversarial robustness and hardware efficiency of AxDNN architectures on systolic-array hardware accelerators. During the NAS process, AxDNN architectures are evolved layer-wise with heterogeneous approximate multipliers to deliver the best trade-offs between adversarial robustness, energy consumption, latency, and memory footprint. The most suitable approximate multipliers are automatically selected from an open-source Evoapprox8b library. Our extensive evaluations provide a set of Pareto optimal hardware efficient and adversarially robust solutions. For example, a Pareto-optimal DNN AxDNN for the MNIST and CIFAR-10 datasets exhibits up to 1.5× higher adversarial robustness, 2.1× less energy consumption, 4.39× reduced latency, and 2.37× low memory footprint when compared to the state-of-the-art NAS approaches. 
    more » « less
  3. Low-latency and low-power edge AI is crucial for Virtual Reality and Augmented Reality applications. Recent advances demonstrate that hybrid models, combining convolution layers (CNN) and transformers (ViT), often achieve a superior accuracy/performance tradeoff on various computer vision and machine learning (ML) tasks. However, hybrid ML models can present system challenges for latency and energy efficiency due to their diverse nature in dataflow and memory access patterns. In this work, we leverage architecture heterogeneity from Neural Processing Units (NPU) and Compute-In-Memory (CIM) and explore diverse execution schemas to efficiently execute these hybrid models. We introduce H4H-NAS, a two-stage Neural Architecture Search (NAS) framework to automate the design of efficient hybrid CNN/ViT models for heterogeneous edge systems featuring both NPU and CIM. We propose a two-phase incremental supernet training in our NAS framework to resolve gradient conflicts between sampled subnets caused by different types of blocks in a hybrid model search space. Our H4H-NAS approach is also powered by a performance estimator built with NPU performance results measured on real silicon, and CIM performance based on industry IPs. H4H-NAS searches hybrid CNN-ViT models with fine granularity and achieves significant (up to 1.34%) top-1 accuracy improvement on ImageNet. Moreover, results from our algorithm/hardware co-design reveal up to 56.08% overall latency and 41.72% energy improvements by introducing heterogeneous computing over baseline solutions. Overall, our framework guides the design of hybrid network architectures and system architectures for NPU+CIM heterogeneous systems. 
    more » « less
  4. In this work, we employ neural architecture search (NAS) to enhance the efficiency of deploying diverse machine learning (ML) tasks on in-memory computing (IMC) architectures. Initially, we design three fundamental components inspired by the convolutional layers found in VGG and ResNet models. Subsequently, we utilize Bayesian optimization to construct a convolutional neural network (CNN) model with adaptable depths, employing these components. Through the Bayesian search algorithm, we explore a vast search space comprising over 640 million network configurations to identify the optimal solution, considering various multi-objective cost functions like accuracy/latency and accuracy/energy. Our evaluation of this NAS approach for IMC architecture deployment spans three distinct image classification datasets, demonstrating the effectiveness of our method in achieving a balanced solution characterized by high accuracy and reduced latency and energy consumption. 
    more » « less
  5. The success of DL can be attributed to hours of parameter and architecture tuning by human experts. Neural Architecture Search (NAS) techniques aim to solve this problem by automating the search procedure for DNN architectures making it possible for non-experts to work with DNNs. Specifically, One-shot NAS techniques have recently gained popularity as they are known to reduce the search time for NAS techniques. One-Shot NAS works by training a large template network through parameter sharing which includes all the candidate NNs. This is followed by applying a procedure to rank its components through evaluating the possible candidate architectures chosen randomly. However, as these search models become increasingly powerful and diverse, they become harder to understand. Consequently, even though the search results work well, it is hard to identify search biases and control the search progression, hence a need for explainability and human-in-the-loop (HIL) One-Shot NAS. To alleviate these problems, we present NAS-Navigator, a visual analytics (VA) system aiming to solve three problems with One-Shot NAS; explainability, HIL design, and performance improvements compared to existing state-of-the-art (SOTA) techniques. NAS-Navigator gives full control of NAS back in the hands of the users while still keeping the perks of automated search, thus assisting non-expert users. Analysts can use their domain knowledge aided by cues from the interface to guide the search. Evaluation results confirm the performance of our improved One-Shot NAS algorithm is comparable to other SOTA techniques. While adding Visual Analytics (VA) using NAS-Navigator shows further improvements in search time and performance. We designed our interface in collaboration with several deep learning researchers and evaluated NAS-Navigator through a control experiment and expert interviews. 
    more » « less