Serverless computing allows developers to deploy and scale stateless functions in ephemeral workers easily. As a result, serverless computing has been widely used for many applications, such as computer vision, video processing, and HTML generation. However, we find that the stateless nature of serverless computing wastes many of the important benefits modern language runtimes have to offer. A notable example is the extensive profiling and Just-in-Time (JIT) compilation effort that runtimes implement to achieve acceptable performance of popular high-level languages, such as Java, JavaScript, and Python. Unfortunately, when modern language runtimes are naively adopted in serverless computing, all of these efforts are lost upon worker eviction. Checkpoint-restore methods alleviate the problem by resuming workers from snapshots taken after initialization. However, production-grade language runtimes can take up to thousands of invocations to fully optimize a single function, thus rendering naive checkpoint-restore policies ineffective. This paper proposes Pronghorn, a snapshot serverless orchestrator that automatically monitors the function performance and decides (1) when it is the right moment to take a snapshot and (2) which snapshot to use for new workers. Pronghorn is agnostic to the underlying platform and JIT runtime, thus easing its integration into existing runtimes and worker deployment environments (container, virtual machine, etc.). On a set of representative serverless benchmarks, Pronghorn provides end-to-end median latency improvements of 37.2% across 9 out of 13 benchmarks (20-58% latency reduction) when compared to state-of-art checkpointing policies. 
                        more » 
                        « less   
                    
                            
                            Nightcore: efficient and scalable serverless computing for latency-sensitive, interactive microservices
                        
                    
    
            The microservice architecture is a popular software engineering approach for building flexible, large-scale online services. Serverless functions, or function as a service (FaaS), provide a simple programming model of stateless functions which are a natural substrate for implementing the stateless RPC handlers of microservices, as an alternative to containerized RPC servers. However, current serverless platforms have millisecond-scale runtime overheads, making them unable to meet the strict sub-millisecond latency targets required by existing interactive microservices. We present Nightcore, a serverless function runtime with microsecond-scale overheads that provides container-based isolation between functions. Nightcore’s design carefully considers various factors having microsecond-scale overheads, including scheduling of function requests, communication primitives, threading models for I/O, and concurrent function executions. Nightcore currently supports serverless functions written in C/C++, Go, Node.js, and Python. Our evaluation shows that when running latency-sensitive interactive microservices, Nightcore achieves 1.36×–2.93× higher throughput and up to 69% reduction in tail latency. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 2008321
- PAR ID:
- 10292912
- Date Published:
- Journal Name:
- Architectrual Support for Programming Languages and Operating Systems
- Page Range / eLocation ID:
- 152 to 166
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Serverless computing promises an efficient, low-cost compute capability in cloud environments. However, existing solutions, epitomized by open-source platforms such as Knative, include heavyweight components that undermine this goal of serverless computing. Additionally, such serverless platforms lack dataplane optimizations to achieve efficient, high-performance function chains that facilitate the popular microservices development paradigm. Their use of unnecessarily complex and duplicate capabilities for building function chains severely degrades performance. 'Cold-start' latency is another deterrent. We describe SPRIGHT, a lightweight, high-performance, responsive serverless framework. SPRIGHT exploits shared memory processing and dramatically improves the scalability of the dataplane by avoiding unnecessary protocol processing and serialization-deserialization overheads. SPRIGHT extensively leverages event-driven processing with the extended Berkeley Packet Filter (eBPF). We creatively use eBPF's socket message mechanism to support shared memory processing, with overheads being strictly load-proportional. Compared to constantly-running, polling-based DPDK, SPRIGHT achieves the same dataplane performance with 10× less CPU usage under realistic workloads. Additionally, eBPF benefits SPRIGHT, by replacing heavyweight serverless components, allowing us to keep functions 'warm' with negligible penalty. Our preliminary experimental results show that SPRIGHT achieves an order of magnitude improvement in throughput and latency compared to Knative, while substantially reducing CPU usage, and obviates the need for 'cold-start'.more » « less
- 
            SPRIGHT: High-Performance eBPF-Based Event-Driven, Shared-Memory Processing for Serverless ComputingServerless computing promises an efficient, low-cost compute capability in cloud environments. However, existing solutions, epitomized by open-source platforms such as Knative, include heavyweight components that undermine this goal of serverless computing. Additionally, such serverless platforms lack dataplane optimizations to achieve efficient, high-performance function chains that facilitate the popular microservices development paradigm. Their use of unnecessarily complex and duplicate capabilities for building function chains severely degrades performance. ‘Cold-start’ latency is another deterrent. We describe SPRIGHT, a lightweight, high-performance, responsive serverless framework. SPRIGHT exploits shared memory processing and dramatically improves the scalability of the dataplane by avoiding unnecessary protocol processing and serialization-deserialization overheads. SPRIGHT extensively leverages event-driven processing with the extended Berkeley Packet Filter (eBPF). We creatively use eBPF’s socket message mechanism to support shared memory processing, with overheads being strictly load-proportional. Compared to constantly-running, polling-based DPDK, SPRIGHT achieves the same dataplane performance with 10× less CPU usage under realistic workloads. Additionally, eBPF benefits SPRIGHT, by replacing heavyweight serverless components, allowing us to keep functions ‘warm’ with negligible penalty. Our preliminary experimental results show that SPRIGHT achieves an order of magnitude improvement in throughput and latency compared to Knative, while substantially reducing CPU usage, and obviates the need for ‘cold-start’.more » « less
- 
            Application networks facilitate communication between the microservices of cloud applications. They are built today using service meshes with low-level specifications that make it difficult to express application-specific functionality (e.g., access control based on RPC fields), and they can more than double the RPC latency. We develop AppNet, a framework that makes it easy to build expressive and high-performance application networks. Developers specify rich RPC processing in a high-level language with generalized match-action rules and built-in state management. We compile the specifications to high-performance code after optimizing where (e.g., client, server) and how (e.g., RPC library, proxy) each RPC processing element runs. The optimization uses symbolic abstraction and execution to judge if different runtime configurations of possibly-stateful RPC processing elements are semantically equivalent for arbitrary RPC streams. Our experiments show that AppNet can express common application network function in only 7-28 lines of code. Its optimizations lower RPC processing latency by up to 82%.more » « less
- 
            Serverless computing is a new pay-per-use cloud service paradigm that automates resource scaling for stateless functions and can potentially facilitate bursty machine learning serving. Batching is critical for latency performance and cost-effectiveness of machine learning inference, but unfortunately it is not supported by existing serverless platforms due to their stateless design. Our experiments show that without batching, machine learning serving cannot reap the benefits of serverless computing. In this paper, we present BATCH, a framework for supporting efficient machine learning serving on serverless platforms. BATCH uses an optimizer to provide inference tail latency guarantees and cost optimization and to enable adaptive batching support. We prototype BATCH atop of AWS Lambda and popular machine learning inference systems. The evaluation verifies the accuracy of the analytic optimizer and demonstrates performance and cost advantages over the state-of-the-art method MArk and the state-of-the-practice tool SageMaker.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    