skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Award ID contains: 2440334

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. SLOWPOKE is a new system to accurately quantify the effects of hypothetical optimizations on end-to-end throughput for microservice applications, without relying on tracing or a priori knowledge of the call graph. Microservice operators can use SLOWPOKE to ask what-if performance analysis questions of the form "What throughput could my retail application sustain if I optimized the shopping cart service from 10K req/s to 20K req/s?". Given a target service and its hypothetical optimization, SLOWPOKE employs a perfor- mance model that determines how to selectively slow down non-target services to preserve the relative effect of the optimization. It then performs profiling experiments to predict the end-to-end throughput, as if the optimization had been implemented. Applied to four real-world microservice applications, SLOWPOKE accurately quantifies optimization effects with a root mean squared error of only 2.07%. It is also effective in more complex scenarios, e.g., predicting throughput after scaling optimizations or when bottlenecks arise from mutex contention. Evaluated in large-scale deployments of 45 nodes and 108 synthetic benchmarks, SLOWPOKE further demonstrates its scalability and coverage of a wide range of microservice characteristics. 
    more » « less
    Free, publicly-accessible full text available May 4, 2027
  2. We present Apache Flink 2.0, an evolution of the popular stream processing system's architecture that decouples computation from state management. Flink 2.0 relies on a remote distributed file system (DFS) for primary state storage and uses local disks as a secondary cache, with state updates streamed continuously and directly to the DFS. To address the latency implications of remote storage, Flink 2.0 incorporates an asynchronous runtime execution model. Furthermore, Flink 2.0 introduces ForSt, a novel state store featuring a unified file system that enables faster and lightweight checkpointing, recovery, and reconfiguration with minimal intrusion to the existing Flink runtime architecture. Using a comprehensive set of Nexmark benchmarks and a large-scale stateful production workload, we evaluate Flink 2.0's large-state processing, checkpointing, and recovery mechanisms. Our results show significant performance improvements and reduced resource utilization compared to the baseline Flink 1.20 implementation. Specifically, we observe up to 94% reduction in checkpoint duration, up to 49× faster recovery after failures or a rescaling operation, and up to 50% cost savings. 
    more » « less
    Free, publicly-accessible full text available August 1, 2026
  3. Free, publicly-accessible full text available June 22, 2026