skip to main content


Search for: All records

Award ID contains: 1846046

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Free, publicly-accessible full text available June 17, 2024
  2. Cloud applications are increasingly shifting from large monolithic services to complex graphs of loosely-coupled microservices. Despite their benefits, microservices are prone to cascading performance issues, and can lead to prolonged periods of degraded performance. We present Sage, a machine learning-driven root cause analysis system for interactive cloud microservices that is both accurate and practical. We show that Sage correctly identifies the root causes of performance issues across a diverse set of microservices and takes action to address them, leading to more predictable, performant, and efficient cloud systems. 
    more » « less
  3. Many cloud services have Quality-of-Service (QoS) requirements; most requests have to to complete within a given latency constraint. Recently, researchers have begun to investigate whether it is possible to meet QoS while attempting to save power on a per-request basis. Existing work shows that one can indeed hand-tune a request latency predictor offline for a particular cloud application, and consult it at runtime to modulate CPU voltage and frequency, resulting in substantial power savings. In this paper, we propose ReTail, an automated and general solution for request-level power management of latency-critical services with QoS constraints. We present a systematic process to select the features of any given application that best correlate with its request latency. ReTail uses these features to predict latency, and adjust CPU’s power consumption. ReTail’s predictor is trained fully at runtime. We show that unlike previous findings, simple techniques perform better than complex machine learning models, when using the right input features. For a web search engine, ReTail outperforms prior mechanisms based on complex hand-tuned predictors for that application domain. Furthermore, ReTail’s systematic approach also yields superior power savings across a diverse set of cloud applications. 
    more » « less
  4. The slowdown of Moore’s Law, combined with advances in 3D stacking of logic and memory, have pushed architects to revisit the concept of processing-in-memory (PIM) to overcome the memory wall bottleneck. This PIM renaissance finds itself in a very different computing landscape from the one twenty years ago, as more and more computation shifts to the cloud. Most PIM architecture papers still focus on best-effort applications, while PIM’s impact on latency-critical cloud applications is not well understood. This paper explores how datacenters can exploit PIM architectures in the context of latency-critical applications. We adopt a general-purpose cloud server with HBM-based, 3D-stacked logic+memory modules, and study the impact of PIM on six diverse interactive cloud applications. We reveal the previously neglected opportunity that PIM presents to these services, and show the importance of properly managing PIM-related resources to meet the QoS targets of interactive services and maximize resource efficiency. Then, we present PIMCloud, a QoS-aware resource manager designed for cloud systems with PIM allowing colocation of multiple latency-critical and best-effort applications. We show that PIMCloud efficiently manages PIM resources: it (1) improves effective machine utilization by up to 70% and 85% (average 24% and 33%) under 2-app and 3-app mixes, compared to the best state-of-the-art manager; (2) helps latency-critical applications meet QoS; and (3) adapts to varying load patterns. 
    more » « less
  5. Infrastructure-as-a-Service cloud providers sell virtual machines that are only specified in terms of number of CPU cores, amount of memory, and I/O throughput. Performance-critical aspects such as cache sizes and memory latency are missing or reported in ways that make them hard to compare across cloud providers. It is difficult for users to adapt their application’s behavior to the available resources. In this work, we aim to increase the visibility that cloud users have into shared resources on public clouds. Specifically, we present CacheInspector , a lightweight runtime that determines the performance and allocated capacity of shared caches on multi-tenant public clouds. We validate CacheInspector ’s accuracy in a controlled environment, and use it to study the characteristics and variability of cache resources in the cloud, across time, instances, availability regions, and cloud providers. We show that CacheInspector ’s output allows cloud users to tailor their application’s behavior, including their output quality, to avoid suboptimal performance when resources are scarce. 
    more » « less
  6. null (Ed.)
  7. null (Ed.)