skip to main content


Title: NNReArch: A Tensor Program Scheduling Framework Against Neural Network Architecture Reverse Engineering
Architecture reverse engineering has become an emerging attack against deep neural network (DNN) implemen- tations. Several prior works have utilized side-channel leakage to recover the model architecture while the an DNN is executing on a hardware acceleration platform. In this work, we target an open- source deep-learning accelerator, Versatile Tensor Accelerator (VTA), and utilize electromagnetic (EM) side-channel leakage to comprehensively learn the association between DNN architecture configurations and EM emanations. We also consider the holistic system – including the low-level tensor program code of the VTA accelerator on a Xilinx FPGA, and explore the effect of such low- level configurations on the EM leakage. Our study demonstrates that both the optimization and configuration of tensor programs will affect the EM side-channel leakage. Gaining knowledge of the association between low-level tensor program and the EM emanations, we propose NNReArch, a lightweight tensor program scheduling framework against side- channel-based DNN model architecture reverse engineering. Specifically, NNReArch targets reshaping the EM traces of different DNN operators, through scheduling the tensor program execution of the DNN model so as to confuse the adversary. NNReArch is a comprehensive protection framework supporting two modes, a balanced mode that strikes a balance between the DNN model confidentiality and execution performance, and a secure mode where the most secure setting is chosen. We imple- ment and evaluate the proposed framework on the open-source VTA with state-of-the-art DNN architectures. The experimental results demonstrate that NNReArch can efficiently enhance the model architecture security with a small performance overhead. In addition, the proposed obfuscation technique makes reverse engineering of the DNN architecture significantly harder.  more » « less
Award ID(s):
1929300 2043183 2153690
NSF-PAR ID:
10357578
Author(s) / Creator(s):
; ; ; ;
Date Published:
Journal Name:
2022 IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)
Volume:
2022
Page Range / eLocation ID:
1 to 9
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Deep neural network (DNN) accelerators as an example of domain-specific architecture have demonstrated great success in DNN inference. However, the architecture acceleration for equally important DNN training has not yet been fully studied. With data forward, error backward and gradient calculation, DNN training is a more complicated process with higher computation and communication intensity. Because the recent research demonstrates a diminishing specialization return, namely, “accelerator wall”, we believe that a promising approach is to explore coarse-grained parallelism among multiple performance-bounded accelerators to support DNN training. Distributing computations on multiple heterogeneous accelerators to achieve high throughput and balanced execution, however, remaining challenging. We present ACCPAR, a principled and systematic method of determining the tensor partition among heterogeneous accelerator arrays. Compared to prior empirical or unsystematic methods, ACCPAR considers the complete tensor partition space and can reveal previously unknown new parallelism configurations. ACCPAR optimizes the performance based on a cost model that takes into account both computation and communication costs of a heterogeneous execution environment. Hence, our method can avoid the drawbacks of existing approaches that use communication as a proxy of the performance. The enhanced flexibility of tensor partitioning in ACCPAR allows the flexible ratio of computations to be distributed among accelerators with different performances. The proposed search algorithm is also applicable to the emerging multi-path patterns in modern DNNs such as ResNet. We simulate ACCPAR on a heterogeneous accelerator array composed of both TPU-v2 and TPU-v3 accelerators for the training of large-scale DNN models such as Alexnet, Vgg series and Resnet series. The average performance improvements of the state-of-the-art “one weird trick” (OWT) and HYPAR, and ACCPAR, normalized to the baseline data parallelism scheme where each accelerator replicates the model and processes different input data in parallel, are 2.98×, 3.78×, and 6.30×, respectively. 
    more » « less
  2. Typical cybersecurity solutions emphasize on achieving defense functionalities. However, execution efficiency and scalability are equally important, especially for real-world deployment. Straightforward mappings of cybersecurity applications onto HPC platforms may significantly underutilize the HPC devices’ capacities. On the other hand, the sophisticated implementations are quite difficult: they require both in-depth understandings of cybersecurity domain-specific characteristics and HPC architecture and system model. In our work, we investigate three sub-areas in cybersecurity, including mobile software security, network security, and system security. They have the following performance issues, respectively: 1) The flow- and context-sensitive static analysis for the large and complex Android APKs are incredibly time-consuming. Existing CPU-only frameworks/tools have to set a timeout threshold to cease the program analysis to trade the precision for performance. 2) Network intrusion detection systems (NIDS) use automata processing as its searching core and requires line-speed processing. However, achieving high-speed automata processing is exceptionally difficult in both algorithm and implementation aspects. 3) It is unclear how the cache configurations impact time-driven cache side-channel attacks’ performance. This question remains open because it is difficult to conduct comparative measurement to study the impacts. In this dissertation, we demonstrate how application-specific characteristics can be leveraged to optimize implementations on various types of HPC for faster and more scalable cybersecurity executions. For example, we present a new GPU-assisted framework and a collection of optimization strategies for fast Android static data-flow analysis that achieve up to 128X speedups against the plain GPU implementation. For network intrusion detection systems (IDS), we design and implement an algorithm capable of eliminating the state explosion in out-of-order packet situations, which reduces up to 400X of the memory overhead. We also present tools for improving the usability of Micron’s Automata Processor. To study the cache configurations’ impact on time-driven cache side-channel attacks’ performance, we design an approach to conducting comparative measurement. We propose a quantifiable success rate metric to measure the performance of time-driven cache attacks and utilize the GEM5 platform to emulate the configurable cache. 
    more » « less
  3. null (Ed.)
    Security of deep neural network (DNN) inference engines, i.e., trained DNN models on various platforms, has become one of the biggest challenges in deploying artificial intelligence in domains where privacy, safety, and reliability are of paramount importance, such as in medical applications. In addition to classic software attacks such as model inversion and evasion attacks, recently a new attack surface---implementation attacks which include both passive side-channel attacks and active fault injection and adversarial attacks---is arising, targeting implementation peculiarities of DNN to breach their confidentiality and integrity. This paper presents several novel passive and active attacks on DNN we have developed and tested over medical datasets. Our new attacks reveal a largely under-explored attack surface of DNN inference engines. Insights gained during attack exploration will provide valuable guidance for effectively protecting DNN execution against reverse-engineering and integrity violations. 
    more » « less
  4. Analog compute‐in‐memory (CIM) systems are promising candidates for deep neural network (DNN) inference acceleration. However, as the use of DNNs expands, protecting user input privacy has become increasingly important. Herein, a potential security vulnerability is identified wherein an adversary can reconstruct the user's private input data from a power side‐channel attack even without knowledge of the stored DNN model. An attack approach using a generative adversarial network is developed to achieve high‐quality data reconstruction from power leakage measurements. The analyses show that the attack methodology is effective in reconstructing user input data from power leakage of the analog CIM accelerator, even at large noise levels and after countermeasures. To demonstrate the efficacy of the proposed approach, an example of CIM inference of U‐Net for brain tumor detection is attacked, and the original magnetic resonance imaging medical images can be successfully reconstructed even at a noise level of 20% standard deviation of the maximum power signal value. This study highlights a potential security vulnerability in emerging analog CIM accelerators and raises awareness of needed safety features to protect user privacy in such systems.

     
    more » « less
  5. The tremendous growth in the number of Internet of Things (IoT) devices has increased focus on the energy efficiency and security of an IoT device. In this paper, we will present a design level, non-volatile adiabatic architecture for low-energy and Correlation Power Analysis (CPA) resistant IoT devices. IoT devices constructed with CMOS integrated circuits suffer from high dynamic energy and leakage power. To solve this, we look at both adiabatic logic and STT-MTJs (Spin Transfer Torque Magnetic Tunnel Junctions) to reduce both dynamic energy and leakage power. Furthermore, CMOS integrated circuits suffer from side-channel leakage making them insecure against power analysis attacks. We again look to adiabatic logic to design secure circuits with uniform power consumption, thus, defending against power analysis attacks. We have developed a hybrid adiabatic-MTJ architecture using two-phase adiabatic logic. We show that hybrid adiabatic-MTJ circuits are both low energy and secure when compared with CMOS circuits. As a case study, we have constructed one round of PRESENT and have shown energy savings of 64.29% at a frequency of 25 MHz. Furthermore, we have performed a correlation power analysis attack on our proposed design and determined that the key was kept hidden. 
    more » « less