skip to main content

Title: PLASMA: programmable elasticity for stateful cloud computing applications
Developers are always on the lookout for simple solutions to manage their applications on cloud platforms. Major cloud providers have already been offering automatic elasticity management solutions (e.g., AWS Lambda, Azure durable function) to users. However, many cloud applications are stateful --- while executing, functions need to share their state with others. Providing elasticity for such stateful functions is much more challenging, as a deployment/elasticity decision for a stateful entity can strongly affect others in ways which are hard to predict without any application knowledge. Existing solutions either only support stateless applications (e.g., AWS Lambda) or only provide limited elasticity management (e.g., Azure durable function) to stateful applications. PLASMA (Programmable Elasticity for Stateful Cloud Computing Applications) is a programming framework for elastic stateful cloud applications. It includes (1) an elasticity programming language as a second "level" of programming (complementing the main application programming language) for describing elasticity behavior, and (2) a novel semantics-aware elasticity management runtime that tracks program execution and acts upon application features as suggested by elasticity behavior. We have implemented 10+ applications with PLASMA. Extensive evaluation on Amazon AWS shows that PLASMA significantly improves their efficiency, e.g., achieving same performance as a vanilla setup with 25% fewer resources, more » or improving performance by 40% compared to the default setup. « less
; ; ; ; ;
Award ID(s):
Publication Date:
Journal Name:
EuroSys ’20
Page Range or eLocation-ID:
1 to 15
Sponsoring Org:
National Science Foundation
More Like this
  1. Serverless computing is an emerging paradigm in which an application's resource provisioning and scaling are managed by third-party services. Examples include AWS Lambda, Azure Functions, and Google Cloud Functions. Behind these services' easy-to-use APIs are opaque, complex infrastructure and management ecosystems. Taking on the viewpoint of a serverless customer, we conduct the largest measurement study to date, launching more than 50,000 function instances across these three services, in order to characterize their architectures, performance, and resource management efficiency. We explain how the platforms isolate the functions of different accounts, using either virtual machines or containers, which has important security implications. We characterize performance in terms of scalability, coldstart latency, and resource efficiency, with highlights including that AWS Lambda adopts a bin-packing-like strategy to maximize VM memory utilization, that severe contention between functions can arise in AWS and Azure, and that Google had bugs that allow customers to use resources for free.
  2. Serverless computing is a new cloud programming and deployment paradigm that is receiving wide-spread uptake. Serverless offerings such as Amazon Web Services (AWS) Lambda, Google Functions, and Azure Functions automatically execute simple functions uploaded by developers, in response to cloud-based event triggers. The serverless abstraction greatly simplifies integration of concurrency and parallelism into cloud applications, and enables deployment of scalable distributed systems and services at very low cost. Although a significant first step, the serverless abstraction requires tools that software engineers can use to reason about, debug, and optimize their increasingly complex, asynchronous applications. Toward this end, we investigate the design and implementation of GammaRay, a cloud service that extracts causal dependencies across functions and through cloud services, without programmer intervention. We implement GammaRay for AWS Lambda and evaluate the overheads that it introduces for serverless micro-benchmarks and applications written in Python.
  3. Internet of Things (IoT) devices are becoming increasingly prevalent in our environment, yet the process of programming these devices and processing the data they produce remains difficult. Typically, data is processed on device, involving arduous work in low level languages, or data is moved to the cloud, where abundant resources are available for Functions as a Service (FaaS) or other handlers. FaaS is an emerging category of flexible computing services, where developers deploy self-contained functions to be run in portable and secure containerized environments; however, at the moment, these functions are limited to running in the cloud or in some cases at the "edge" of the network using resource rich, Linux-based systems. In this work, we investigate NanoLambda, a portable platform that brings FaaS, high-level language programming, and familiar cloud service APIs to non-Linux and microcontroller-based IoT devices. To enable this, NanoLambda couples a new, minimal Python runtime system that we have designed for the least capable end of the IoT device spectrum, with API compatibility for AWS Lambda and S3. NanoLambda transfers functions between IoT devices (sensors, edge, cloud), providing power and latency savings while retaining the programmer productivity benefits of high-level languages and FaaS. A key feature ofmore »NanoLambda is a scheduler that intelligently places function executions across multi-scale IoT deployments according to resource availability and power constraints. We evaluate a range of applications that use NanoLambda to run on devices as small as the ESP8266 with 64KB of ram and 512KB flash storage.« less
  4. —Infrastructure-as-a-Service (IaaS), and more generally the “cloud,” like Amazon Web Services (AWS) or Microsoft Azure, have changed the landscape of system operations on the Internet. Their elasticity allows operators to rapidly allocate and use resources as needed, from virtual machines, to storage, to bandwidth, and even to IP addresses, which is what made them popular and spurred innovation. In this paper, we show that the dynamic component paired with recent developments in trust-based ecosystems (e.g., SSL certificates) creates so far unknown attack vectors. Specifically, we discover a substantial number of stale DNS records that point to available IP addresses in clouds, yet, are still actively attempted to be accessed. Often, these records belong to discontinued services that were previously hosted in the cloud. We demonstrate that it is practical, and time and cost efficient for attackers to allocate IP addresses to which stale DNS records point. Considering the ubiquity of domain validation in trust ecosystems, like SSL certificates, an attacker can impersonate the service using a valid certificate trusted by all major operating systems and browsers. The attacker can then also exploit residual trust in the domain name for phishing, receiving and sending emails, or possibly distribute code to clientsmore »that load remote code from the domain (e.g., loading of native code by mobile apps, or JavaScript libraries by websites). Even worse, an aggressive attacker could execute the attack in less than 70 seconds, well below common time-to-live (TTL) for DNS records. In turn, it means an attacker could exploit normal service migrations in the cloud to obtain a valid SSL certificate for domains owned and managed by others, and, worse, that she might not actually be bound by DNS records being (temporarily) stale, but that she can exploit caching instead. We introduce a new authentication method for trust-based domain validation that mitigates staleness issues without incurring additional certificate requester effort by incorporating existing trust of a name into the validation process. Furthermore, we provide recommendations for domain name owners and cloud operators to reduce their and their clients’ exposure to DNS staleness issues and the resulting domain takeover attacks.« less
  5. null (Ed.)
    Python has become a widely used programming language for research, not only for small one-off analyses, but also for complex application pipelines running at supercomputer- scale. Modern parallel programming frameworks for Python present users with a more granular unit of management than traditional Unix processes and batch submissions: the Python function. We review the challenges involved in running native Python functions at scale, and present techniques for dynamically determining a minimal set of dependencies and for assembling a lightweight function monitor (LFM) that captures the software environment and manages resources at the granularity of single functions. We evaluate these techniques in a range of environ- ments, from campus cluster to supercomputer, and show that our advanced dependency management planning and dynamic re- source management methods provide superior performance and utilization relative to coarser-grained management approaches, achieving several-fold decrease in execution time for several large Python applications.