The serverless and functions as a service (FaaS) paradigms are currently trending among cloud providers and are now increasingly being applied to the network edge, and to the Internet of Things (IoT) devices. The benefits include reduced latency for communication, less network traffic and increased privacy for data processing. However, there are challenges as IoT devices have limited resources for running multiple simultaneous containerized functions, and also FaaS does not typically support long-running functions. Our implementation utilizes Docker and CRIU for checkpointing and suspending long-running blocking functions. The results show that checkpointing is slightly slower than regular Docker pause, but it saves memory and allows for more long-running functions to be run on an IoT device. Furthermore, the resulting checkpoint files are small, hence they are suitable for live migration and backing up stateful functions, therefore improving availability and reliability of the system.
more »
« less
NanoLambda: Implementing Functions as a Service at All Resource Scales for the Internet of Things.
Internet of Things (IoT) devices are becoming increasingly prevalent in our environment, yet the process of programming these devices and processing the data they produce remains difficult. Typically, data is processed on device, involving arduous work in low level languages, or data is moved to the cloud, where abundant resources are available for Functions as a Service (FaaS) or other handlers. FaaS is an emerging category of flexible computing services, where developers deploy self-contained functions to be run in portable and secure containerized environments; however, at the moment, these functions are limited to running in the cloud or in some cases at the "edge" of the network using resource rich, Linux-based systems. In this work, we investigate NanoLambda, a portable platform that brings FaaS, high-level language programming, and familiar cloud service APIs to non-Linux and microcontroller-based IoT devices. To enable this, NanoLambda couples a new, minimal Python runtime system that we have designed for the least capable end of the IoT device spectrum, with API compatibility for AWS Lambda and S3. NanoLambda transfers functions between IoT devices (sensors, edge, cloud), providing power and latency savings while retaining the programmer productivity benefits of high-level languages and FaaS. A key feature of NanoLambda is a scheduler that intelligently places function executions across multi-scale IoT deployments according to resource availability and power constraints. We evaluate a range of applications that use NanoLambda to run on devices as small as the ESP8266 with 64KB of ram and 512KB flash storage.
more »
« less
- PAR ID:
- 10259914
- Date Published:
- Journal Name:
- ACM Symposium on Edge Computing
- Page Range / eLocation ID:
- 220 to 231
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Serverless computing enables a new way of building and scaling cloud applications by allowing developers to write fine-grained serverless or cloud functions. The execution duration of a cloud function is typically short---ranging from a few milliseconds to hundreds of seconds. However, due to resource contentions caused by public clouds' deep consolidation, the function execution duration may get significantly prolonged and fail to accurately account for the function's true resource usage. We observe that the function duration can be highly unpredictable with huge amplification of more than 50× for an open-source FaaS platform (OpenLambda). Our experiments show that the OS scheduling policy of cloud functions' host server can have a crucial impact on performance. The default Linux scheduler, CFS (Completely Fair Scheduler), being oblivious to workloads, frequently context-switches short functions, causing a turnaround time that is much longer than their service time. We propose SFS (Smart Function Scheduler), which works entirely in the user space and carefully orchestrates existing Linux FIFO and CFS schedulers to approximate Shortest Remaining Time First (SRTF). SFS uses two-level scheduling that seamlessly combines a new FILTER policy with Linux CFS, to trade off increased duration of long functions for significant performance improvement for short functions. We implement SFS in the Linux user space and port it to OpenLambda. Evaluation results show that SFS significantly improves short functions' duration with a small impact on relatively longer functions, compared to CFS.more » « less
-
Geo-distributed Edge sites are expected to cater to the stringent demands of situation-aware applications like collaborative autonomous vehicles and drone swarms. While clients of such applications benefit from having network-proximal compute resources, an Edge site has limited resources compared to the traditional Cloud. Moreover, the load experienced by an Edge site depends on a client's mobility pattern, which may often be unpredictable. The Function-as-a-Service (FaaS) paradigm is poised aptly to handle the ephemeral nature of workload demand at Edge sites. In FaaS, applications are decomposed into containerized functions enabling fine-grained resource management. However, spatio-temporal variations in client mobility can still lead to rapid saturation of resources beyond the capacity of an Edge site.To address this challenge, we develop FEO (Federated Edge Orchestrator), a resource allocation scheme across the geodistributed Edge infrastructure for FaaS. FEO employs a novel federated policy to offload function invocations to peer sites with spare resource capacity without the need to frequently share knowledge about available capacities among participating sites. Detailed experiments show that FEO's approach can reduce a site's P99 latency by almost 3x, while maintaining application service level objectives at all other sites.more » « less
-
Deep neural networks (DNNs) are being applied to various areas such as computer vision, autonomous vehicles, and healthcare, etc. However, DNNs are notorious for their high computational complexity and cannot be executed efficiently on resource constrained Internet of Things (IoT) devices. Various solutions have been proposed to handle the high computational complexity of DNNs. Offloading computing tasks of DNNs from IoT devices to cloud/edge servers is one of the most popular and promising solutions. While such remote DNN services provided by servers largely reduce computing tasks on IoT devices, it is challenging for IoT devices to inspect whether the quality of the service meets their service level objectives (SLO) or not. In this paper, we address this problem and propose a novel approach named QIS (quality inspection sampling) that can efficiently inspect the quality of the remote DNN services for IoT devices. To realize QIS, we design a new ID-generation method to generate data (IDs) that can identify the serving DNN models on edge servers. QIS inserts the IDs into the input data stream and implements sampling inspection on SLO violations. The experiment results show that the QIS approach can reliably inspect, with a nearly 100% success rate, the service qualtiy of remote DNN services when the SLA level is 99.9% or lower at the cost of only up to 0.5% overhead.more » « less
-
We introduce Canal, a programmable, topic-based, publish/subscribe system that is designed for multi-tier cloud deployments (e.g. edge-cloud, multi-cloud, IoT-cloud, etc.). Canal implements a triggered computational (i.e. “serverless”) programming model and provides developers with a uniform and portable programming interface. To achieve scalability and reliability, Canal combines the use of a distributed hash table (DHT) and replica consensus protocol to distribute and replicate functions, state, and data. Canal also decouples replica placement from the DHT topology to allow developers to optimize function placement for different objectives. We evaluate Canal using a real-world multi-tier IoT deployment and we use Canal to compare placement strategies, end-to-end performance, and failure recovery using both benchmarks and a real-world IoT-edge application. Our results show that Canal is able to achieve both low latency and reliability in this setting.more » « less