Cloud applications are increasingly shifting to interactive and loosely-coupled microservices. Despite their advantages, microservices complicate resource management, due to inter-tier dependencies. We present Sinan, a cluster manager for interactive microservices that leverages easily-obtainable tracing data instead of empirical decisions, to infer the impact of a resource allocation on end-to-end performance, and allocate appropriate resources to each tier. In a preliminary evaluation of Sinan with an end-to-end social network built with microservices, we show that Sinan’s data-driven approach, allows the service to always meet its QoS without sacrificing resource efficiency. 
                        more » 
                        « less   
                    
                            
                            Sinan: Data-Driven Resource Management for Interactive Multi-tier Microservices
                        
                    
    
            Cloud applications are increasingly shifting to interactive and loosely-coupled microservices. Despite their advantages, microservices complicate resource management, due to inter-tier dependencies. We present Sinan and PuppetMaster, two cluster managers for interactive microservices that leverages easily-obtainable tracing data instead of empirical decisions, to infer the impact of a resource allocation on end-to-end performance, and allocate appropriate resources to each tier. In a preliminary evaluation of the system with an end-to-end social network built with microservices, we show that the cluster manager's data-driven approach allows the service to always meet its QoS without sacrificing resource efficiency. 
        more » 
        « less   
        
    
                            - Award ID(s):
- 1704742
- PAR ID:
- 10188080
- Date Published:
- Journal Name:
- Workshop on ML for Computer Architecture and Systems (MLArchSys)
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
- 
            
- 
            Transient computing has become popular in public cloud environments for running delay-insensitive batch and data processing applications at low cost. Since transient cloud servers can be revoked at any time by the cloud provider, they are considered unsuitable for running interactive application such as web services. In this paper, we present VM deflation as an alternative mechanism to server preemption for reclaiming resources from transient cloud servers under resource pressure. Using real traces from top-tier cloud providers, we show the feasibility of using VM deflation as a resource reclamation mechanism for interactive applications in public clouds. We show how current hypervisor mechanisms can be used to implement VM deflation and present cluster deflation policies for resource management of transient and on-demand cloud VMs. Experimental evaluation of our deflation system on a Linux cluster shows that microservice-based applications can be deflated by up to 50% with negligible performance overhead. Our cluster-level deflation policies allow overcommitment levels as high as 50%, with less than a 1% decrease in application throughput, and can enable cloud platforms to increase revenue by 30%more » « less
- 
            Traditional systems for allocating finite cluster resources among competing jobs have either aimed at providing fairness, relied on users to specify their resource requirements, or have estimated these requirements via surrogate metrics (e.g. CPU utilization). These approaches do not account for a job’s real world performance (e.g. P95 latency). Existing performance-aware systems use offline profiled data and/or are designed for specific allocation objectives. In this work, we argue that resource allocation systems should directly account for real-world performance and the varied allocation objectives of users. In this pursuit, we build Cilantro. At the core of Cilantro is an online learning mechanism which forms feedback loops with the jobs to estimate the resource to performance mappings and load shifts. This relieves users from the onerous task of job profiling and collects reliable real-time feedback. This is then used to achieve a variety of user-specified scheduling objectives. Cilantro handles the uncertainty in the learned models by adapting the underlying policy to work with confidence bounds. We demonstrate this in two settings. First, in a multi-tenant 1000 CPU cluster with 20 independent jobs, three of Cilantro’s policies outperform 9 other baselines on three different performance-aware scheduling objectives, improving user utilities by up to 1.2 − 3.7x. Second, in a microservices setting, where 160 CPUs must be distributed between 19 inter-dependent microservices, Cilantro outperforms 3 other baselines, reducing the end-to-end P99 latency to x0.57 the next best baseline.more » « less
- 
            Increasing application complexity has caused applications to be refactored into smaller components known as microservices that communicate with each other using RPCs. Distributed tracing has emerged as an important debugging tool for such microservice-based applications. Distributed tracing follows the journey of a user request from its starting point at the application's front-end, through RPC calls made by the front-end to different microservices recursively, all the way until a response is constructed and sent back to the user. To reduce storage costs, distributed tracing systems sample traces before collecting them for subsequent querying, affecting the accuracy of queries on the collected traces. We propose an alternative system, Snicket, that tightly integrates querying and collection of traces. Snicket takes as input a database-style streaming query that expresses the analysis the developer wants to perform on the trace data. This query is compiled into a distributed collection of microservice extensions that run as "bumps-in-the-wire," intercepting RPC requests and responses as they flow into and out of microservices. This collection of extensions implements the query, performing early filtering and computation on the traces to reduce the amount of stored data in a query-specific manner. We show that Snicket is expressive in the queries it can support and can update queries fast enough for interactive use.more » « less
- 
            Increasingly, the heterogeneity of devices and software that comprise the Internet of Things (IoT) is impeding innovation. IoT deployments amalgamate compute, storage, networking capabilities provisioned at multiple resource scales, from low-cost, resource constrained microcontrollers to resource rich public cloud servers. To support these different resource scales and capabilities, the operating systems (OSs) that manage them have also diverged significantly. Because the OS is the “API” for the hardware, this proliferation is causing a lack of portability across devices and systems, complicating development, deployment, management, and optimization of IoT applications. To address these impediments, we investigate a new, “clean slate” OS design and implementation that hides this heterogeneity via a new set of abstractions specifically for supporting microservices as a universal application programming model in IoT contexts. The operating system, called Ambience, supports IoT applications structured as microservices and facilitates their portability, isolation, and deployment time optimization. We discuss the design and implementation of Ambience, evaluate its performance, and demonstrate its portability using both microbenchmarks and end-to-end IoT deployments. Our results show that Ambience can scale down to 64MHz microcontrollers and up to modern x86_64 servers, while providing similar or better performance than comparable commodity operating systems on the same range of hardware platforms.more » « less
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
 
                                    