skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: The Role of Edge Offload for Hardware-Accelerated Mobile Devices
This position paper examines a spectrum of approaches to overcoming the limited computing power of mobile devices caused by their need to be small, lightweight and energy efficient. At one extreme is offloading of compute-intensive operations to a cloudlet nearby. At the other extreme is the use of fixed-function hardware accelerators on mobile devices. Between these endpoints lie various configurations of programmable hardware accelerators. We explore the strengths and weaknesses of these approaches and conclude that they are, in fact, complementary. Based on this insight, we advocate a software-hardware co-evolution path that combines their strengths.  more » « less
Award ID(s):
1815882
PAR ID:
10492635
Author(s) / Creator(s):
; ; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
HotMobile '21: Proceedings of the 22nd International Workshop on Mobile Computing Systems and Applications
ISBN:
9781450383233
Page Range / eLocation ID:
22 to 29
Format(s):
Medium: X
Location:
Virtual United Kingdom
Sponsoring Org:
National Science Foundation
More Like this
  1. Image super-resolution (SR) is widely used on mobile devices to enhance user experience. However, neural networks used for SR are computationally expensive, posing challenges for mobile devices with limited computing power. A viable solution is to use heterogeneous processors on mobile devices, especially the specialized hardware AI accelerators, for SR computations, but the reduced arithmetic precision on AI accelerators can lead to degraded perceptual quality in upscaled images. To address this limitation, in this paper we present SR For Your Eyes (FYE-SR), a novel image SR technique that enhances the perceptual quality of upscaled images when using heterogeneous processors for SR computations. FYESR strategically splits the SR model and dispatches different layers to heterogeneous processors, to meet the time constraint of SR computations while minimizing the impact of AI accelerators on image quality. Experiment results show that FYE-SR outperforms the best baselines, improving perceptual image quality by up to 2x, or reducing SR computing latency by up to 5.6x with on-par image quality. 
    more » « less
  2. null (Ed.)
    There is an increasing emphasis on securing deep learning (DL) inference pipelines for mobile and IoT applications with privacy-sensitive data. Prior works have shown that privacy-sensitive data can be secured throughout deep learning inferences on cloud-offloaded models through trusted execution environments such as Intel SGX. However, prior solutions do not address the fundamental challenges of securing the resource-intensive inference tasks on low-power, low-memory devices (e.g., mobile and IoT devices), while achieving high performance. To tackle these challenges, we propose SecDeep, a low-power DL inference framework demonstrating that both security and performance of deep learning inference on edge devices are well within our reach. Leveraging TEEs with limited resources, SecDeep guarantees full confidentiality for input and intermediate data, as well as the integrity of the deep learning model and framework. By enabling and securing neural accelerators, SecDeep is the first of its kind to provide trusted and performant DL model inferencing on IoT and mobile devices. We implement and validate SecDeep by interfacing the ARM NN DL framework with ARM TrustZone. Our evaluation shows that we can securely run inference tasks with 16× to 172× faster performance than no acceleration approaches by leveraging edge-available accelerators. 
    more » « less
  3. null (Ed.)
    Hardware accelerators are essential to the accommodation of ever-increasing Deep Neural Network (DNN) workloads on the resource-constrained embedded devices. While accelerators facilitate fast and energy-efficient DNN operations, their accuracy is threatened by faults in their on-chip and off-chip memories, where millions of DNN weights are held. The use of emerging Non-Volatile Memories (NVM) further exposes DNN accelerators to a non-negligible rate of permanent defects due to immature fabrication, limited endurance, and aging. To tolerate defects in NVM-based DNN accelerators, previous work either requires extra redundancy in hardware or performs defect-aware retraining, imposing significant overhead. In comparison, this paper proposes a set of algorithms that exploit the flexibility in setting the fault-free bits in weight memory to effectively approximate weight values, so as to mitigate defect-induced accuracy drop. These algorithms can be applied as a one-step solution when loading the weights to embedded devices. They only require trivial hardware support and impose negligible run-time overhead. Experiments on popular DNN models show that the proposed techniques successfully boost inference accuracy even in the face of elevated defect rates in the weight memory. 
    more » « less
  4. Abstract Recent remarkable progress in artificial intelligence (AI) has garnered tremendous attention from researchers, industry leaders, and the general public, who are increasingly aware of AI's growing impact on everyday life. The advancements of AI and deep learning have also significantly influenced the field of nanophotonics. On the one hand, deep learning facilitates data‐driven strategies for optimizing and solving forward and inverse problems of nanophotonic devices. On the other hand, photonic devices offer promising optical platforms for implementing deep neural networks. This review explores both AI for photonic design and photonic implementation of AI. Various deep learning models and their roles in the design of photonic devices are introduced, analyzing the strengths and challenges of these data‐driven methodologies from the perspective of computational cost. Additionally, the potential of optical hardware accelerators for neural networks is discussed by presenting a variety of photonic devices capable of performing linear and nonlinear operations, essential building blocks of neural networks. It is believed that the bidirectional interactions between nanophotonics and AI will drive the coevolution of these two research fields. 
    more » « less
  5. Over a billion mobile consumer system-on-chip (SoC) chipsets ship each year. Of these, the mobile consumer market undoubtedly involving smartphones has a significant market share. Most modern smartphones comprise of advanced SoC architectures that are made up of multiple cores, GPS, and many different programmable and fixed-function accelerators connected via a complex hierarchy of interconnects with the goal of running a dozen or more critical software usecases under strict power, thermal and energy constraints. The steadily growing complexity of a modern SoC challenges hardware computer architects on how best to do early stage ideation. Late SoC design typically relies on detailed full-system simulation once the hardware is specified and accelerator software is written or ported. However, early-stage SoC design must often select accelerators before a single line of software is written. To help frame SoC thinking and guide early stage mobile SoC design, in this paper we contribute the Gables model that refines and retargets the Roofline model-designed originally for the performance and bandwidth limits of a multicore chip-to model each accelerator on a SoC, to apportion work concurrently among different accelerators (justified by our usecase analysis), and calculate a SoC performance upper bound. We evaluate the Gables model with an existing SoC and develop several extensions that allow Gables to inform early stage mobile SoC design. 
    more » « less