H4H: Hybrid Convolution-Transformer Architecture Search for NPU-CIM Heterogeneous Systems for AR/VR Applications

Zhao, Yiwei; Li, Ziyun; Khwa, Win-San; Sun, Xiaoyu; Zhang, Sai Qian; Sarwar, Syed Shakib; Stangherlin, Kleber Hugo; Lu, Yi-Lun; Gomez, Jorge Tomas; Seo, Jae-sun; Gibbons, Phillip B; De_Salvo, Barbara; Liu, Chiao

Citation Details

This content will become publicly available on January 20, 2026

H4H: Hybrid Convolution-Transformer Architecture Search for NPU-CIM Heterogeneous Systems for AR/VR Applications

Low-latency and low-power edge AI is crucial for Virtual Reality and Augmented Reality applications. Recent advances demonstrate that hybrid models, combining convolution layers (CNN) and transformers (ViT), often achieve a superior accuracy/performance tradeoff on various computer vision and machine learning (ML) tasks. However, hybrid ML models can present system challenges for latency and energy efficiency due to their diverse nature in dataflow and memory access patterns. In this work, we leverage architecture heterogeneity from Neural Processing Units (NPU) and Compute-In-Memory (CIM) and explore diverse execution schemas to efficiently execute these hybrid models. We introduce H4H-NAS, a two-stage Neural Architecture Search (NAS) framework to automate the design of efficient hybrid CNN/ViT models for heterogeneous edge systems featuring both NPU and CIM. We propose a two-phase incremental supernet training in our NAS framework to resolve gradient conflicts between sampled subnets caused by different types of blocks in a hybrid model search space. Our H4H-NAS approach is also powered by a performance estimator built with NPU performance results measured on real silicon, and CIM performance based on industry IPs. H4H-NAS searches hybrid CNN-ViT models with fine granularity and achieves significant (up to 1.34%) top-1 accuracy improvement on ImageNet. Moreover, results from our algorithm/hardware co-design reveal up to 56.08% overall latency and 41.72% energy improvements by introducing heterogeneous computing over baseline solutions. Overall, our framework guides the design of hybrid network architectures and system architectures for NPU+CIM heterogeneous systems. more »

Award ID(s):: 1919223 2211882

PAR ID:: 10568732

Author(s) / Creator(s):: Zhao, Yiwei; Li, Ziyun; Khwa, Win-San; Sun, Xiaoyu; Zhang, Sai Qian; Sarwar, Syed Shakib; Stangherlin, Kleber Hugo; Lu, Yi-Lun; Gomez, Jorge Tomas; Seo, Jae-sun; Gibbons, Phillip B; De_Salvo, Barbara; Liu, Chiao

Publisher / Repository:: 30th Asia and South Pacific Design Automation Conference (ASP-DAC'25)

Date Published:: 2025-01-20

Journal Name:: Proceedings of the ASPDAC Asia and South Pacific Design Automation Conference

ISSN:: 2153-6961

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
This content will become publicly available on January 20, 2026
Conference Paper:
The DOI is not currently available.

More Like this