Field-programmable gate arrays (FPGAs) have largely been used in communication and high-performance computing and given the recent advances in big data and emerging trends in cloud computing (e.g., serverless [18]), FPGAs are increasingly being introduced into these domains (e.g., Microsoft’s datacenters [6] and Amazon Web Services [10]). To address these domains’ processing needs, recent research has focused on using FPGAs to accelerate workloads, ranging from analytics and machine learning to databases and network function virtualization. In this paper, we present an ongoing effort to realize a high-performance FPGA-as-a-microservice (FaaM) architecture for the cloud. We discuss some of the technical challenges and propose several solutions for efficiently integrating FPGAs into virtualized environments. Our case study deploying a multithreaded, multi-user compression as a microservice using the FaaM architecture indicate that microservices-based FPGA acceleration can sustain high-performance compared to straightforward implementation with minimal to no communication overhead despite the hardware abstraction.
more »
« less
This content will become publicly available on November 3, 2025
SoK: Virtualization Challenges and Techniques in Serverless Computing
This systematization of knowledge (SoK) paper summarizes the discussion of virtualization challenges and the corresponding techniques specific to serverless computing. We examine virtualization solutions, including paravirtualization, containers, lightweight hypervisors and kernels, and unikernels, and their applicability to serverless. Then, we discuss several challenges, including cold-start optimization, resource co-location, benchmarking, and the research-production gap, hoping to inspire future research.
more »
« less
- PAR ID:
- 10582808
- Publisher / Repository:
- The 2nd Workshop on Hot Topics in System Infrastructure
- Date Published:
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Field-programmable gate arrays (FPGAs) have largely been used in communication and high-performance computing, and given the recent advances in big data, machine learning and emerging trends in cloud computing (e.g., serverless [1]), FPGAs are increasingly being introduced into these domains (e.g., Microsoft’s datacenters [2] and Amazon Web Services [3]). To address these domains’ processing needs, recent research has focused on using FPGAs to accelerate workloads, ranging from analytics and machine learning to databases and network function virtualization. In this paper, we present a high-performance FPGA-as-a-microservice (FaaM) architecture for the cloud. We discuss some of the technical challenges and propose several solutions for efficiently integrating FPGAs into virtualized environments. Our case study deploying a multi-threaded, multi-user compression as a microservice using FaaM indicates that microservices-based FPGA acceleration can sustain high-performance as compared to a straightforward CPU implementation with minimal to no communication overhead despite the hardware abstraction.more » « less
-
Foster, Ian; Chard, Kyle; Babuji, Yadu (Ed.)The historical motivation for serverless comes from internet-of-things, smartphone client server, and the objective of simplifying programming (no provisioning) and scale-down (pay-for-use). These applications are generally low-performance best-effort. However, the serverless model enables flexible software architectures suitable for a wide range of applications that demand high-performance and guaranteed performance. We have studied three such applications - scientific data streaming, virtual/augmented reality, and document annotation. We describe how each can be cast in a serverless software architecture and how the application performance requirements translate into high performance requirements (invocation rate, low and predictable latency) for the underlying serverless system implementation. These applications can require invocations rates as high as millions per second (40 MHz) and latency deadlines below a microsecond (300 ns), and furthermore require performance predictability. All of these capabilities are far in excess of today's commercial serverless offerings and represent interesting research challenges.more » « less
-
Serverless computing has become increasingly popular for cloud applications, due to its compelling properties of high-level abstractions, lightweight runtime, high elasticity and pay-per-use billing. In this revolutionary computing paradigm shift, challenges arise when adapting data analytics applications to the serverless environment, due to the lack of support for efficient state sharing, which attract ever-growing research attention. In this paper, we aim to exploit the advantages of task level orchestration and fine-grained resource provisioning for data analytics on serverless platforms, with the hope of fulfilling the promise of serverless deployment to the maximum extent. To this end, we present ACTS, an autonomous cost-efficient task orchestration framework for serverless analytics. ACTS judiciously schedules and coordinates function tasks to mitigate cold-start latency and state sharing overhead. In addition, ACTS explores the optimization space of fine-grained workload distribution and function resource configuration for cost efficiency. We have deployed and implemented ACTS on AWS Lambda, evaluated with various data analytics workloads. Results from extensive experiments demonstrate that ACTS achieves up to 98% monetary cost reduction while maintaining superior job completion time performance, in comparison with the state-of-the-art baselines.more » « less
-
The salient pay-per-use nature of serverless computing has driven its continuous penetration as an alternative computing paradigm for various workloads. Yet, challenges arise and remain open when shifting machine learning workloads to the serverless environment. Specifically, the restriction on the deployment size over serverless platforms combining with the complexity of neural network models makes it difficult to deploy large models in a single serverless function. In this paper, we aim to fully exploit the advantages of the serverless computing paradigm for machine learning workloads targeting at mitigating management and overall cost while meeting the response-time Service Level Objective (SLO). We design and implement AMPS-Inf, an autonomous framework customized for model inferencing in serverless computing. Driven by the cost-efficiency and timely-response, our proposed AMPS-Inf automatically generates the optimal execution and resource provisioning plans for inference workloads. The core of AMPS-Inf relies on the formulation and solution of a Mixed-Integer Quadratic Programming problem for model partitioning and resource provisioning with the objective of minimizing cost without violating response time SLO. We deploy AMPS-Inf on the AWS Lambda platform, evaluate with the state-of-the-art pre-trained models in Keras including ResNet50, Inception-V3 and Xception, and compare with Amazon SageMaker and three baselines. Experimental results demonstrate that AMPSInf achieves up to 98% cost saving without degrading response time performance.more » « less
An official website of the United States government
