With the increasing demand for computationally intensive services like deep learning tasks, emerging distributed computing platforms such as edge computing (EC) systems are becoming more popular. Edge computing systems have shown promising results in terms of latency reduction compared to the traditional cloud systems. However, their limited processing capacity imposes a trade-off between the potential latency reduction and the achieved accuracy in computationally-intensive services such as deep learning-based services. In this paper, we focus on finding the optimal accuracy-time trade-off for running deep learning services in a three-tier EC platform where several deep learning models with different accuracy levels are available. Specifically, we cast the problem as an Integer Linear Program, where optimal task scheduling decisions are made to maximize overall user satisfaction in terms of accuracy-time trade-off. We prove that our problem is NP-hard and then provide a polynomial constant-time greedy algorithm, called GUS, that is shown to attain near-optimal results. Finally, upon vetting our algorithmic solution through numerical experiments and comparison with a set of heuristics, we deploy it on a testbed implemented to measure for real-world results. The results of both numerical analysis and real-world implementation show that GUS can outperform the baseline heuristics in terms of the average percentage of satisfied users by a factor of at least 50%.
more »
« less
QoS-Aware Placement of Deep Learning Services on the Edge with Multiple Service Implementations
Mobile edge computing pushes computationally-intensive services closer to the user to provide reduced delay due to physical proximity. This has led many to consider deploying deep learning models on the edge – commonly known as edge intelligence (EI). EI services can have many model implementations that provide different QoS. For instance, one model can perform inference faster than another (thus reducing latency) while achieving less accuracy when evaluated. In this paper, we study joint service placement and model scheduling of EI services with the goal to maximize Quality-of-Servcice (QoS) for end users where EI services have multiple implementations to serve user requests, each with varying costs and QoS benefits. We cast the problem as an integer linear program and prove that it is NP-hard. We then prove the objective is equivalent to maximizing a monotone increasing, submodular set function and thus can be solved greedily while maintaining a (1 – 1/e)-approximation guarantee. We then propose two greedy algorithms: one that theoretically guarantees this approximation and another that empirically matches its performance with greater efficiency. Finally, we thoroughly evaluate the proposed algorithm for making placement and scheduling decisions in both synthetic and real-world scenarios against the optimal solution and some baselines. In the real-world case, we consider real machine learning models using the ImageNet 2012 data-set for requests. Our numerical experiments empirically show that our more efficient greedy algorithm is able to approximate the optimal solution with a 0.904 approximation on average, while the next closest baseline achieves a 0.607 approximation on average.
more »
« less
- Award ID(s):
- 1948387
- PAR ID:
- 10317528
- Date Published:
- Journal Name:
- IEEE ICCCN Big Data and Machine Learning for Networking (BDMLN) Workshop • 2021
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Emerging Edge Computing (EC) technology has shown promise for many delay-sensitive Deep Learning (DL) based applications of smart cities in terms of improved Quality-of-Service (QoS). EC requires judicious decisions which jointly consider the limited capacity of the edge servers and provided QoS of DL-dependent services. In a smart city environment, tasks may have varying priorities in terms of when and how to serve them; thus, priorities of the tasks have to be considered when making resource management decisions. In this paper, we focus on finding optimal offloading decisions in a three-tier user-edge-cloud architecture while considering different priority classes for the DL-based services and making a trade-off between a task’s completion time and the provided accuracy by the DL-based service. We cast the optimization problem as an Integer Linear Program (ILP) where the objective is to maximize a function called gain of system (GoS) defined based on provided QoS and priority of the tasks. We prove the problem is NP-hard. We then propose an efficient offloading algorithm, called PGUS, that is shown to achieve near-optimal results in terms of the provided GoS. Finally, we compare our proposed algorithm, PGUS, with heuristics and a state-of-the-art algorithm, called GUS, using both numerical analysis and real-world implementation. Our results show that PGUS outperforms GUS by a factor of 45% in average in terms of serving the top 25% higher priority classes of the tasks while still keeping the overall percentage of the dropped tasks minimal and the overall gain of system maximized.more » « less
-
Age of information has been proposed recently to measure information freshness, especially for a class of real-time video applications. These applications often demand timely updates with edge cloud computing to guarantee the user experience. However, the edge cloud is usually equipped with limited computation and network resources and therefore, resource contention among different video streams can contribute to making the updates stale. Aiming to minimize a penalty function of the weighted sum of the average age over multiple end users, this paper presents a greedy traffic scheduling policy for the processor to choose the next processing request with the maximum immediate penalty reduction. In this work, we formulate the service process when requests from multiple users arrive at edge cloud servers asynchronously and show that the proposed greedy scheduling algorithm is the optimal work- conserving policy for a class of age penalty functions.more » « less
-
null (Ed.)Age of information has been proposed recently to measure information freshness, especially for a class of real-time video applications. These applications often demand timely updates with edge cloud computing to guarantee the user experience. However, the edge cloud is usually equipped with limited computation and network resources and therefore, resource contention among different video streams can contribute to making the updates stale. Aiming to minimize a penalty function of the weighted sum of the average age over multiple end users, this paper presents a greedy traffic scheduling policy for the processor to choose the next processing request with the maximum immediate penalty reduction. In this work, we formulate the service process when requests from multiple users arrive at edge cloud servers asynchronously and show that the proposed greedy scheduling algorithm is the optimal work-conserving policy for a class of age penalty functions.more » « less
-
The proliferation of innovative mobile services such as augmented reality, networked gaming, and autonomous driving has spurred a growing need for low-latency access to computing resources that cannot be met solely by existing centralized cloud systems. Mobile Edge Computing (MEC) is expected to be an effective solution to meet the demand for low-latency services by enabling the execution of computing tasks at the network-periphery, in proximity to end-users. While a number of recent studies have addressed the problem of determining the execution of service tasks and the routing of user requests to corresponding edge servers, the focus has primarily been on the efficient utilization of computing resources, neglecting the fact that non-trivial amounts of data need to be stored to enable service execution, and that many emerging services exhibit asymmetric bandwidth requirements. To fill this gap, we study the joint optimization of service placement and request routing in MEC-enabled multi-cell networks with multidimensional (storage-computation-communication) constraints. We show that this problem generalizes several problems in literature and propose an algorithm that achieves close-to-optimal performance using randomized rounding. Evaluation results demonstrate that our approach can effectively utilize the available resources to maximize the number of requests served by low-latency edge cloud servers.more » « less
An official website of the United States government

