NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Lumos: Efficient Performance Modeling and Estimation for Large-scale LLM Training

Liang, Mingyu; Kassa, Hiwot; Fu, Wenyin; Coutinho, Brian; Feng, Louis; Delimitrou, Christina (May 2025, MLSys)

Training LLMs in distributed environments presents significant challenges due to the complexity of model execution, deployment systems, and the vast space of configurable strategies. Although various optimization techniques exist, achieving high efficiency in practice remains difficult. Accurate performance models that effectively characterize and predict a model’s behavior are essential for guiding optimization efforts and system-level studies. We propose Lumos, a trace-driven performance modeling and estimation toolkit for large-scale LLM training, designed to accurately capture and predict the execution behaviors of modern LLMs. We evaluate Lumos on a production ML cluster with up to 512 NVIDIA H100 GPUs using various GPT-3 variants, demonstrating that it can replay execution time with an average error of just 3.3%, along with other runtime details, across different models and configurations. Additionally, we validate its ability to estimate performance for new setups from existing traces, facilitating efficient exploration of model and deployment configurations.
more » « less
Free, publicly-accessible full text available May 12, 2026
The Sunk Carbon Fallacy: Rethinking Carbon Footprint Metrics for Effective Carbon-Aware Scheduling

https://doi.org/10.1145/3698038.3698542

Bashir, Noman; Gohil, Varun; Subramanya, Anagha Belavadi; Shahrad, Mohammad; Irwin, David; Olivetti, Elsa; Delimitrou, Christina (November 2024, ACM)

The rapid increase in computing demand and corresponding energy consumption have focused attention on computing's impact on the climate and sustainability. Prior work proposes metrics that quantify computing's carbon footprint across several lifecycle phases, including its supply chain, operation, and end-of-life. Industry uses these metrics to optimize the carbon footprint of manufacturing hardware and running computing applications. Unfortunately, prior work on optimizing datacenters' carbon footprint often succumbs to the sunk cost fallacy by considering embodied carbon emissions (a sunk cost) when making operational decisions (i.e., job scheduling and placement), which leads to operational decisions that do not always reduce the total carbon footprint. In this paper, we evaluate carbon-aware job scheduling and placement on a given set of servers for several carbon accounting metrics. Our analysis reveals state-of-the-art carbon accounting metrics that include embodied carbon emissions when making operational decisions can increase the total carbon footprint of executing a set of jobs. We study the factors that affect the added carbon cost of such suboptimal decision-making. We then use a real-world case study from a datacenter to demonstrate how the sunk carbon fallacy manifests itself in practice. Finally, we discuss the implications of our findings in better guiding effective carbon-aware scheduling in on-premise and cloud datacenters.
more » « less
Free, publicly-accessible full text available November 20, 2025
Tales of the Tail: Past and Future

https://doi.org/10.1109/MM.2024.3413649

Delimitrou, Christina; Marty, Michael (September 2024, IEEE Micro)

Full Text Available
End-to-End Cloud Application Cloning With Ditto

https://doi.org/10.1109/MM.2024.3419067

Liang, Mingyu; Gan, Yu; Li, Yueying; Torres, Carlos; Dhanotia, Abhishek; Ketkar, Mahesh; Delimitrou, Christina (July 2024, IEEE Micro)

Full Text Available
Characterizing a Memory Allocator at Warehouse Scale

https://doi.org/10.1145/3620666.3651350

Zhou, Zhuangzhuang; Gogte, Vaibhav; Vaish, Nilay; Kennelly, Chris; Xia, Patrick; Kanev, Svilen; Moseley, Tipp; Delimitrou, Christina; Ranganathan, Parthasarathy (April 2024, Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems)

Full Text Available
LibPreemptible: Enabling Fast, Adaptive, and Hardware-Assisted User-Space Scheduling

https://doi.org/10.1109/HPCA57654.2024.00075

Li, Yueying; Lazarev, Nikita; Koufaty, David; Yin, Tenny; Anderson, Andy; Zhang, Zhiru; Suh, G Edward; Kaffes, Kostis; Delimitrou, Christina (March 2024, International Symposium on High Performance Computer Architecture)

Full Text Available
Ursa: Lightweight Resource Management for Cloud-Native Microservices

https://doi.org/10.1109/HPCA57654.2024.00077

Zhang, Yanqi; Zhou, Zhuangzhuang; Elnikety, Sameh; Delimitrou, Christina (March 2024, International Symposium on High Performance Computer Architecture)

Full Text Available
The Importance of Generalizability in Machine Learning for Systems

https://doi.org/10.1109/LCA.2024.3384449

Gohil, Varun; Dev, Sundar; Upasani, Gaurang; Lo, David; Ranganathan, Parthasarathy; Delimitrou, Christina (January 2024, IEEE Computer Architecture Letters)

Full Text Available
Mystique: Enabling Accurate and Scalable Generation of Production AI Benchmarks

https://doi.org/10.1145/3579371.3589072

Liang, Mingyu; Fu, Wenyin; Feng, Louis; Lin, Zhongyi; Panakanti, Pavani; Zheng, Shengbao; Sridharan, Srinivas; Delimitrou, Christina (June 2023, Proceedings International Symposium on Computer Architecture)

Full Text Available

Search for: All records