Deep neural networks (DNNs) are increasingly used in time-critical, learning-enabled cyber-physical applications such as autonomous driving and robotics. Despite the growing use of various deep learning models, protecting DNN inference from adversarial threats while preserving model privacy and confidentiality remains a key concern for resource and timing-constrained autonomous cyber-physical systems. One potential solution, primarily used in general-purpose systems, is the execution of the DNN workloads withintrusted enclavesavailable on current off-the-shelf processors. However, ensuring temporal guarantees when running DNN inference within these enclaves poses significant challenges in real-time applications due to(a)the large computational and memory demands of DNN models and(b)the overhead introduced by frequent context switches between “normal” and “trusted” execution modes. This paper introduces new time-aware schemes for dynamic (EDF) and fixed-priority (RM) schedulers to preserve the confidentiality of DNN tasks by running them inside trusted enclaves. We first propose a technique thatsliceseach DNN layer and runs them sequentially in the enclave. However, due to the extra context switch overheads of individual layer slices, we further introduce a novellayer fusiontechnique. Layer fusion improves real-time guarantees by grouping multiple layers of DNN workload from multiple tasks, thus allowing them to fit and run concurrently within the enclaves while maintaining timing constraints. We implemented and tested our ideas on the Raspberry Pi platform running a DNN-enabled trusted operating system (OP-TEE with DarkNet-TZ) and three DNN architectures (AlexNet-squeezed, Tiny Darknet, YOLOv3-tiny). Compared to the layer-wise partitioning approach, layer fusion can(a)schedule up to 3x more tasksets for EDF and 5x for RM and(b)reduce context switches by up to 11.12x for EDF and by up to 11.06x for RM.
more »
« less
DeepTrust^RT: Confidential Deep Neural Inference Meets Real-Time!
Deep Neural Networks (DNNs) are becoming common in "learning-enabled" time-critical applications such as autonomous driving and robotics. One approach to protect DNN inference from adversarial actions and preserve model privacy/confidentiality is to execute them within trusted enclaves available in modern processors. However, running DNN inference inside limited-capacity enclaves while ensuring timing guarantees is challenging due to (a) large size of DNN workloads and (b) extra switching between "normal" and "trusted" execution modes. This paper introduces new time-aware scheduling schemes - DeepTrust^RT - to securely execute deep neural inferences for learning-enabled real-time systems. We first propose a variant of EDF (called DeepTrust^RT-LW) that slices each DNN layer and runs them sequentially in the enclave. However, due to extra context switch overheads of individual layer slices, we further introduce a novel layer fusion technique (named DeepTrust^RT-FUSION). Our proposed scheme provides hard real-time guarantees by fusing multiple layers of DNN workload from multiple tasks; thus allowing them to fit and run concurrently within the enclaves while maintaining real-time guarantees. We implemented and tested DeepTrust^RT ideas on the Raspberry Pi platform running OP-TEE+DarkNet-TZ DNN APIs and three DNN workloads (AlexNet-squeezed, Tiny Darknet, YOLOv3-tiny). Compared to the layer-wise partitioning approach (DeepTrust^RT-LW), DeepTrust^RT-FUSION can schedule up to 3x more tasksets and reduce context switches by up to 11.12x. We further demonstrate the efficacy of DeepTrust^RT using a flight controller (ArduPilot) case study and find that DeepTrust^RT-FUSION retains real-time guarantees where DeepTrust^RT-LW becomes unschedulable.
more »
« less
- Award ID(s):
- 2312006
- PAR ID:
- 10596398
- Editor(s):
- Pellizzoni, Rodolfo
- Publisher / Repository:
- Schloss Dagstuhl – Leibniz-Zentrum für Informatik
- Date Published:
- Volume:
- 298
- ISSN:
- 1868-8969
- ISBN:
- 978-3-95977-324-9
- Page Range / eLocation ID:
- 13:1-13:24
- Subject(s) / Keyword(s):
- DNN TrustZone Real-Time Systems Computer systems organization → Real-time systems Security and privacy → Systems security
- Format(s):
- Medium: X Size: 24 pages; 1539574 bytes Other: application/pdf
- Size(s):
- 24 pages 1539574 bytes
- Right(s):
- Creative Commons Attribution 4.0 International license; info:eu-repo/semantics/openAccess
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)We propose Zygarde --- which is an energy- and accuracy-aware soft real-time task scheduling framework for batteryless systems that flexibly execute deep learning tasks1 that are suitable for running on microcontrollers. The sporadic nature of harvested energy, resource constraints of the embedded platform, and the computational demand of deep neural networks (DNNs) pose a unique and challenging real-time scheduling problem for which no solutions have been proposed in the literature. We empirically study the problem and model the energy harvesting pattern as well as the trade-off between the accuracy and execution of a DNN. We develop an imprecise computing-based scheduling algorithm that improves the timeliness of DNN tasks on intermittently powered systems. We evaluate Zygarde using four standard datasets as well as by deploying it in six real-life applications involving audio and camera sensor systems. Results show that Zygarde decreases the execution time by up to 26% and schedules 9% -- 34% more tasks with up to 21% higher inference accuracy, compared to traditional schedulers such as the earliest deadline first (EDF).more » « less
-
It is challenging to deploy 3D Convolutional Neural Networks (3D CNNs) on mobile devices, specifically if both real-time execution and high inference accuracy are in demand, because the increasingly large model size and complex model structure of 3D CNNs usually require tremendous computation and memory resources. Weight pruning is proposed to mitigate this challenge. However, existing pruning is either not compatible with modern parallel architectures, resulting in long inference latency or subject to significant accuracy degradation. This paper proposes an end-to-end 3D CNN acceleration framework based on pruning/compilation co-design called Mobile-3DCNN that consists of two parts: a novel, fine-grained structured pruning enhanced by a prune/Winograd adaptive selection (that is mobile-hardware-friendly and can achieve high pruning accuracy), and a set of compiler optimization and code generation techniques enabled by our pruning (to fully transform the pruning benefit to real performance gains). The evaluation demonstrates that Mobile-3DCNN outperforms state-of-the-art end-to-end DNN acceleration frameworks that support 3D CNN execution on mobile devices, Alibaba Mobile Neural Networks and Pytorch-Mobile with speedup up to 34 × with minor accuracy degradation, proving it is possible to execute high-accuracy large 3D CNNs on mobile devices in real-time (or even ultra-real-time).more » « less
-
Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices. However, prior pruning schemes have limited application scenarios due to accuracy degradation, difficulty in leveraging hardware acceleration, and/or restriction on certain types of DNN layers. In this article, we propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations that are applicable to any type of DNN layer while achieving high accuracy and hardware inference performance. With the flexibility of applying different pruning schemes to different layers enabled by our compiler optimizations, we further probe into the new problem of determining the best-suited pruning scheme considering the different acceleration and accuracy performance of various pruning schemes. Two pruning scheme mapping methods—one -search based and the other is rule based—are proposed to automatically derive the best-suited pruning regularity and block size for each layer of any given DNN. Experimental results demonstrate that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework with up to 2.48 \( \times \) and 1.73 \( \times \) DNN inference acceleration on CIFAR-10 and ImageNet datasets without accuracy loss.more » « less
-
Hardware enclaves are designed to execute small pieces of sensitive code or to operate on sensitive data, in isolation from larger, less trusted systems. Partitioning a large, legacy application requires significant effort. Partitioning an application written in a managed language, such as Java, is more challenging because of mutable language characteristics, extensive code reachability in class libraries, and the inevitability of using a heavyweight runtime. Civet is a framework for partitioning Java applications into enclaves. Civet reduces the number of lines of code in the enclave and uses language-level defenses, including deep type checks and dynamic taint-tracking, to harden the enclave interface. Civet also contributes a partitioned Java runtime design, including a garbage collection design optimized for the peculiarities of enclaves. Civet is efficient for data-intensive workloads; partitioning a Hadoop mapper reduces the enclave overhead from 10 to 16–22% without taint-tracking or 70–80% with taint-tracking.more » « less
An official website of the United States government

