skip to main content

This content will become publicly available on November 11, 2023

Title: Keep Clear of the Edges : An Empirical Study of Artificial Intelligence Workload Performance and Resource Footprint on Edge Devices
Recently, with the advent of the Internet of everything and 5G network, the amount of data generated by various edge scenarios such as autonomous vehicles, smart industry, 4K/8K, virtual reality (VR), augmented reality (AR), etc., has greatly exploded. All these trends significantly brought real-time, hardware dependence, low power consumption, and security requirements to the facilities, and rapidly popularized edge computing. Meanwhile, artificial intelligence (AI) workloads also changed the computing paradigm from cloud services to mobile applications dramatically. Different from wide deployment and sufficient study of AI in the cloud or mobile platforms, AI workload performance and their resource impact on edges have not been well understood yet. There lacks an in-depth analysis and comparison of their advantages, limitations, performance, and resource consumptions in an edge environment. In this paper, we perform a comprehensive study of representative AI workloads on edge platforms. We first conduct a summary of modern edge hardware and popular AI workloads. Then we quantitatively evaluate three categories (i.e., classification, image-to-image, and segmentation) of the most popular and widely used AI applications in realistic edge environments based on Raspberry Pi, Nvidia TX2, etc. We find that interaction between hardware and neural network models incurs non-negligible impact and overhead more » on AI workloads at edges. Our experiments show that performance variation and difference in resource footprint limit availability of certain types of workloads and their algorithms for edge platforms, and users need to select appropriate workload, model, and algorithm based on requirements and characteristics of edge environments. « less
; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
2022 IEEE International Performance, Computing, and Communications Conference (IPCCC)
Page Range or eLocation-ID:
7 to 16
Sponsoring Org:
National Science Foundation
More Like this
  1. Edge computing has emerged as a popular paradigm for supporting mobile and IoT applications with low latency or high bandwidth needs. The attractiveness of edge computing has been further enhanced due to the recent availability of special-purpose hardware to accelerate specific compute tasks, such as deep learning inference, on edge nodes. In this paper, we experimentally compare the benefits and limitations of using specialized edge systems, built using edge accelerators, to more traditional forms of edge and cloud computing. Our experimental study using edge-based AI workloads shows that today's edge accelerators can provide comparable, and in many cases better, performance, when normalized for power or cost, than traditional edge and cloud servers. They also provide latency and bandwidth benefits for split processing, across and within tiers, when using model compression or model splitting, but require dynamic methods to determine the optimal split across tiers. We find that edge accelerators can support varying degrees of concurrency for multi-tenant inference applications, but lack isolation mechanisms necessary for edge cloud multi-tenant hosting.
  2. With the growing performance and wide application of deep neural networks (DNNs), recent years have seen enormous efforts on DNN accelerator hardware design for platforms from mobile devices to data centers. The systolic array has been a popular architectural choice for many proposed DNN accelerators with hundreds to thousands of processing elements (PEs) for parallel computing. Systolic array-based DNN accelerators for datacenter applications have high power consumption and nonuniform workload distribution, which makes power delivery network (PDN) design challenging. Server-class multicore processors have benefited from distributed on-chip voltage regulation and heterogeneous voltage regulation (HVR) for improving energy efficiency while guaranteeing power delivery integrity. This paper presents the first work on HVR-based PDN architecture and control for systolic array-based DNN accelerators. We propose to employ a PDN architecture comprising heterogeneous on-chip and off-chip voltage regulators and multiple power domains. By analyzing patterns of typical DNN workloads via a modeling framework, we propose a DNN workload-aware dynamic PDN control policy to maximize system energy efficiency while ensuring power integrity. We demonstrate significant energy efficiency improvements brought by the proposed PDN architecture, dynamic control, and power gating, which lead to a more than five-fold reduction of leakage energy and PDN energy overhead formore »systolic array DNN accelerators.« less
  3. Edge Cloud (EC) is poised to brace massive machine type communication (mMTC) for 5G and IoT by providing compute and network resources at the edge. Yet, the EC being regionally domestic with a smaller scale, faces the challenges of bandwidth and computational throughput. Resource management techniques are considered necessary to achieve efficient resource allocation objectives. Software Defined Network (SDN) enabled EC architecture is emerging as a potential solution that enables dynamic bandwidth allocation and task scheduling for latency sensitive and diverse mobile applications in the EC environment. This study proposes a novel Heuristic Reinforcement Learning (HRL) based flowlevel dynamic bandwidth allocation framework and validates it through end-to-end implementation using OpenFlow meter feature. OpenFlow meter provides granular control and allows demand-based flow management to meet the diverse QoS requirements germane to IoT traffics. The proposed framework is then evaluated by emulating an EC scenario based on real NSF COSMOS testbed topology at The City College of New York. A specific heuristic reinforcement learning with linear-annealing technique and a pruning principle are proposed and compared with the baseline approach. Our proposed strategy performs consistently in both Mininet and hardware OpenFlow switches based environments. The performance evaluation considers key metrics associated with real-timemore »applications: throughput, end-to-end delay, packet loss rate, and overall system cost for bandwidth allocation. Furthermore, our proposed linear annealing method achieves faster convergence rate and better reward in terms of system cost, and the proposed pruning principle remarkably reduces control traffic in the network.« less
  4. The processing demands of current and emerging applications, such as image/video processing, are increasing due to the deluge of data, generated by mobile and edge devices. This raises challenges for a vast range of computing systems, starting from smart-phones and reaching cloud and data centers. Heterogeneous computing demonstrates its ability as an efficient computing model due to its capability to adapt to various workload requirements. Field programmable gate arrays (FPGAs) provide power and performance benefits and have been used in many application domains from embedded systems to the cloud. In this paper, we used a closely coupled CPU-FPGA heterogeneous system to accelerate a sliding window based image processing algorithm, Canny edge detector. We accelerated Canny using two different implementations: Code partitioned and data partitioned. In the data partitioned implementation, we proposed a weighted round-robin based algorithm that partitions input images and distributes the load between the CPU and the FPGA based on latency. The paper also compares the performance of the proposed accelerators with separate CPU and FPGA implementations. Using our hybrid CPU-FPGA based algorithm, we achieved a speedup of up to 4.8× over a CPU-only and up to 2.1× over a FPGA-only implementations. Moreover, the estimated total energy consumptionmore »of our algorithm is more efficient than a CPU-only implementation. Our results show a significant reduction in energy-delay product (EDP) compared to the CPU-only implementation, and comparable EDP results to the FPGA-only implementation.« less
  5. Mobile devices supporting the "Internet of Things" (IoT), often have limited capabilities in computation, battery energy, and storage space, especially to support resource-intensive applications involving virtual reality (VR), augmented reality (AR), multimedia delivery and artificial intelligence (AI), which could require broad bandwidth, low response latency and large computational power. Edge cloud or edge computing is an emerging topic and technology that can tackle the deficiency of the currently centralized-only cloud computing model and move the computation and storage resource closer to the devices in support of the above-mentioned applications. To make this happen, efficient coordination mechanisms and “offloading” algorithms are needed to allow the mobile devices and the edge cloud to work together smoothly. In this survey paper, we investigate the key issues, methods, and various state-of-the-art efforts related to the offloading problem. We adopt a new characterizing model to study the whole process of offloading from mobile devices to the edge cloud. Through comprehensive discussions, we aim to draw an overall “big picture” on the existing efforts and research directions. Our study also indicates that the offloading algorithms in edge cloud have demonstrated profound potentials for future technology and application development.