skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


This content will become publicly available on May 4, 2027

Title: Slowpoke: End-to-end Throughput Optimization Modeling for Microservice Applications
SLOWPOKE is a new system to accurately quantify the effects of hypothetical optimizations on end-to-end throughput for microservice applications, without relying on tracing or a priori knowledge of the call graph. Microservice operators can use SLOWPOKE to ask what-if performance analysis questions of the form "What throughput could my retail application sustain if I optimized the shopping cart service from 10K req/s to 20K req/s?". Given a target service and its hypothetical optimization, SLOWPOKE employs a perfor- mance model that determines how to selectively slow down non-target services to preserve the relative effect of the optimization. It then performs profiling experiments to predict the end-to-end throughput, as if the optimization had been implemented. Applied to four real-world microservice applications, SLOWPOKE accurately quantifies optimization effects with a root mean squared error of only 2.07%. It is also effective in more complex scenarios, e.g., predicting throughput after scaling optimizations or when bottlenecks arise from mutex contention. Evaluated in large-scale deployments of 45 nodes and 108 synthetic benchmarks, SLOWPOKE further demonstrates its scalability and coverage of a wide range of microservice characteristics.  more » « less
Award ID(s):
2440334 2237193
PAR ID:
10659063
Author(s) / Creator(s):
; ; ; ; ;
Publisher / Repository:
USENIX NSDI 2026
Date Published:
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. null (Ed.)
    Systems for ML inference are widely deployed today, but they typically optimize ML inference workloads using techniques designed for conventional data serving workloads and miss critical opportunities to leverage the statistical nature of ML. In this paper, we present WILLUMP, an optimizer for ML inference that introduces two statistically-motivated optimizations targeting ML applications whose performance bottleneck is feature computation. First, WILLUMP automatically cascades feature computation for classification queries: WILLUMP classifies most data inputs using only high-value, low-cost features selected through empirical observations of ML model performance, improving query performance by up to 5× without statistically significant accuracy loss. Second, WILLUMP accurately approximates ML top-K queries, discarding low-scoring inputs with an automatically constructed approximate model and then ranking the remainder with a more powerful model, improving query performance by up to 10× with minimal accuracy loss. WILLUMP automatically tunes these optimizations’ parameters to maximize query performance while meeting an accuracy target. Moreover, WILLUMP complements these statistical optimizations with compiler optimizations to automatically generate fast inference code for ML applications. We show that WILLUMP improves the end-to-end performance of real-world ML inference pipelines curated from major data science competitions by up to 16× without statistically significant loss of accuracy. 
    more » « less
  2. The microservice architecture style has gained popularity due to its ability to fault isolation, ease of scaling applications, and developer’s agility. However, writing applications in the microservice design style has its challenges. Due to the loosely coupled nature, services communicate with others through standard communication APIs. This incurs significant overhead in the application due to communication protocol and data transformation. An inefficient service communication at the microservice application logic can further overwhelm the application. We perform a grey literature review showing that unnecessary data transfer is a real challenge in the industry. To the best of our knowledge, no effective tool is currently available to accurately identify the origins of unnecessary microservice communications that lead to significant performance overhead and provide guidance for optimization. To bridge the knowledge gap, we propose MicroProf, a dynamic program analysis tool to detect unnecessary data transfer in Java-based microservice applications. At the implementation level, MicroProfproposes novel techniques such as remote object sampling and hardware debug registers to monitor remote object usage. MicroProfreports the unnecessary data transfer at the application source code level. Furthermore, MicroProfpinpoints the opportunities for communication API optimization. MicroProfis evaluated on four well-known applications involving two real-world applications and two benchmarks, identifying five inefficient remote invocations. Guided by MicroProf, API optimization achieves an 87.5% reduction in the number of fields within REST API responses. The empirical evaluation further reveals that the optimized services experience a speedup of up to 4.59 ×. 
    more » « less
  3. While serverless platforms substantially simplify the provisioning, configuration, and management of cloud applications, implementing correct services on top of these platforms can present significant challenges to programmers. For example, serverless infrastructures introduce a host of failure modes that are not present in traditional deployments. Individual serverless instances can fail while others continue to make progress, correct but slow instances can be killed by the cloud provider as part of resource management, and providers will often respond to such failures by re-executing requests. For functions with side-effects, these scenarios can create behaviors that are not observable in serverful deployments. In this paper, we propose mu2sls, a framework for implementing microservice applications on serverless using standard Python code with two extra primitives: transactions and asynchronous calls. Our framework orchestrates user-written services to address several challenges, such as failures and re-executions, and provides formal guarantees that the generated serverless implementations are correct. To that end, we present a novel service specification abstraction and formalization of serverless implementations that facilitate reasoning about the correctness of a given application’s serverless implementation. This formalization forms the basis of the mu2sls prototype, which we then use to develop a few real-world microservice applications and show that the performance of the generated serverless implementations achieves significant scalability (3-5× the throughput of a sequential implementation) while providing correctness guarantees in the context of faults, re-execution, and concurrency. 
    more » « less
  4. A microservice-based application is composed of multiple self-contained components called microservices, and controlling inter-service communication is important for enforcing safety properties. Presently, inter-service communication is configured using microservice deployment tools. However, such tools only support a limited class of single-hop policies, which can be overly permissive because they ignore the rich service tree structure of microservice calls. Policies that can express the service tree structure can offer development and security teams more fine-grained control over communication patterns. To this end, we design an expressive policy language to specify service tree structures, and we develop a visibly pushdown automata-based dynamic enforcement mechanism to enforce service tree policies. Our technique is non-invasive: it does not require any changes to service implementations, and does not require access to microservice code. To realize our method, we build a runtime monitor on top of a service mesh, an emerging network infrastructure layer that can control inter-service communication during deployment. In particular, we employ the programmable network traffic filtering capabilities of Istio, a popular service mesh implementation, to implement an online and distributed monitor. Our experiments show that our monitor can enforce rich safety properties while adding minimal latency overhead on the order of milliseconds. 
    more » « less
  5. null (Ed.)
    Microservice Architecture (MSA) is becoming the predominant direction of new cloud-based applications. There are many advantages to using microservices, but also downsides to using a more complex architecture than a typical monolithic enterprise application. Beyond the normal poor coding practices and code smells of a typical application, microservice-specific code smells are difficult to discover within a distributed application setup. There are many static code analysis tools for monolithic applications, but tools to offer code-smell detection for microservice-based applications are lacking. This paper proposes a new approach to detect code smells in distributed applications based on microservices. We develop an MSANose tool to detect up to eleven different microservice specific code smells and share it as open-source. We demonstrate our tool through a case study on two robust benchmark microservice applications and verify its accuracy. Our results show that it is possible to detect code smells within microservice applications using bytecode and/or source code analysis throughout the development process or even before its deployment to production. 
    more » « less