NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks

Mingyu Liang; Wenyin Fu; Louis Feng; Zhongyi Lin; Pavani Panakanti; Shengbao Zheng; Srinivas Sridharan; Christina Delimitrou (June 2023, Proceedings International Symposium on Computer Architecture)

Full Text Available
AQUATOPE: QoS-and-Uncertainty-Aware Resource Management for Multi-stage Serverless Workflows

https://doi.org/10.1145/3567955.3567960

Zhou, Zhuangzhuang; Zhang, Yanqi; Delimitrou, Christina (March 2023, ASPLOS'23)

Full Text Available
Ditto: End-to-End Application Cloning for Networked Cloud Services

https://doi.org/10.1145/3575693.3575751

Liang, Mingyu; Gan, Yu; Li, Yueying; Torres, Carlos; Dhanotia, Abhishek; Ketkar, Mahesh; Delimitrou, Christina (March 2023, ASPLOS'23)

Full Text Available
Practical and Scalable ML-Driven Cloud Performance Debugging with Sage

Yu Gan, Mingyu Liang (May 2022, IEEE Micro Special Issue on Top Picks from Computer Architecture Conference of 2021)

Cloud applications are increasingly shifting from large monolithic services to complex graphs of loosely-coupled microservices. Despite their benefits, microservices are prone to cascading performance issues, and can lead to prolonged periods of degraded performance. We present Sage, a machine learning-driven root cause analysis system for interactive cloud microservices that is both accurate and practical. We show that Sage correctly identifies the root causes of performance issues across a diverse set of microservices and takes action to address them, leading to more predictable, performant, and efficient cloud systems.
more » « less
Full Text Available
ReTail: Opting for Learning Simplicity to Enable QoS-Aware Power Management in the Cloud

Shuang Chen, Angela Jin (April 2022, 28th IEEE International Symposium on High-Performance Computer Architecture (HPCA-28))

Many cloud services have Quality-of-Service (QoS) requirements; most requests have to to complete within a given latency constraint. Recently, researchers have begun to investigate whether it is possible to meet QoS while attempting to save power on a per-request basis. Existing work shows that one can indeed hand-tune a request latency predictor offline for a particular cloud application, and consult it at runtime to modulate CPU voltage and frequency, resulting in substantial power savings. In this paper, we propose ReTail, an automated and general solution for request-level power management of latency-critical services with QoS constraints. We present a systematic process to select the features of any given application that best correlate with its request latency. ReTail uses these features to predict latency, and adjust CPU’s power consumption. ReTail’s predictor is trained fully at runtime. We show that unlike previous findings, simple techniques perform better than complex machine learning models, when using the right input features. For a web search engine, ReTail outperforms prior mechanisms based on complex hand-tuned predictors for that application domain. Furthermore, ReTail’s systematic approach also yields superior power savings across a diverse set of cloud applications.
more » « less
Full Text Available
PIMCloud: QoS-Aware Resource Management of Latency-Critical Applications in Clouds with Processing-in-Memory

Shuang Chen, Yi Jiang (April 2022, 28th IEEE International Symposium on High-Performance Computer Architecture (HPCA-28))

The slowdown of Moore’s Law, combined with advances in 3D stacking of logic and memory, have pushed architects to revisit the concept of processing-in-memory (PIM) to overcome the memory wall bottleneck. This PIM renaissance finds itself in a very different computing landscape from the one twenty years ago, as more and more computation shifts to the cloud. Most PIM architecture papers still focus on best-effort applications, while PIM’s impact on latency-critical cloud applications is not well understood. This paper explores how datacenters can exploit PIM architectures in the context of latency-critical applications. We adopt a general-purpose cloud server with HBM-based, 3D-stacked logic+memory modules, and study the impact of PIM on six diverse interactive cloud applications. We reveal the previously neglected opportunity that PIM presents to these services, and show the importance of properly managing PIM-related resources to meet the QoS targets of interactive services and maximize resource efficiency. Then, we present PIMCloud, a QoS-aware resource manager designed for cloud systems with PIM allowing colocation of multiple latency-critical and best-effort applications. We show that PIMCloud efficiently manages PIM resources: it (1) improves effective machine utilization by up to 70% and 85% (average 24% and 33%) under 2-app and 3-app mixes, compared to the best state-of-the-art manager; (2) helps latency-critical applications meet QoS; and (3) adapts to varying load patterns.
more » « less
Full Text Available
Faster and Cheaper Serverless Computing on Harvested Resources

https://doi.org/10.1145/3477132.3483580

Zhang, Yanqi; Goiri, Íñigo; Chaudhry, Gohar Irfan; Fonseca, Rodrigo; Elnikety, Sameh; Delimitrou, Christina; Bianchini, Ricardo (October 2021, SOSP '21: Proceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles)

Full Text Available
CacheInspector: Reverse Engineering Cache Resources in Public Clouds

https://doi.org/10.1145/3457373

Song, Weijia; Delimitrou, Christina; Shen, Zhiming; Renesse, Robbert Van; Weatherspoon, Hakim; Benmohamed, Lotfi; Vaulx, Frederic De; Mahmoudi, Charif (September 2021, ACM Transactions on Architecture and Code Optimization)

Infrastructure-as-a-Service cloud providers sell virtual machines that are only specified in terms of number of CPU cores, amount of memory, and I/O throughput. Performance-critical aspects such as cache sizes and memory latency are missing or reported in ways that make them hard to compare across cloud providers. It is difficult for users to adapt their application’s behavior to the available resources. In this work, we aim to increase the visibility that cloud users have into shared resources on public clouds. Specifically, we present CacheInspector , a lightweight runtime that determines the performance and allocated capacity of shared caches on multi-tenant public clouds. We validate CacheInspector ’s accuracy in a controlled environment, and use it to study the characteristics and variability of cache resources in the cloud, across time, instances, availability regions, and cloud providers. We show that CacheInspector ’s output allows cloud users to tailor their application’s behavior, including their output quality, to avoid suboptimal performance when resources are scarce.
more » « less
Full Text Available
Sinan: ML-based and QoS-aware resource management for cloud microservices

https://doi.org/10.1145/3445814.3446693

Zhang, Yanqi; Hua, Weizhe; Zhou, Zhuangzhuang; Suh, G. Edward; Delimitrou, Christina (April 2021, 26th ACM International Con-ference on Architectural Support for Programming Languages and OperatingSystems (ASPLOS ’21))
null (Ed.)
Full Text Available
Dagger: efficient and fast RPCs in cloud microservices with near-memory reconfigurable NICs

https://doi.org/10.1145/3445814.3446696

Lazarev, Nikita; Xiang, Shaojie; Adit, Neil; Zhang, Zhiru; Delimitrou, Christina (April 2021, 26th ACM Interna-tional Conference on Architectural Support for Programming Languages andOperating Systems (ASPLOS ’21))
null (Ed.)
Full Text Available

« Prev Next »

Search for: All records