NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The case for accurate lifetime accounting in carbon metrics

https://doi.org/10.1145/3764944.3764965

Huang, Yujin; Zhu, Timothy; Gandhi, Anshul (August 2025, ACM SIGMETRICS Performance Evaluation Review)

To represent the entire carbon footprint of computing devices, carbon metrics often include both an embodied cost (i.e., carbon cost to produce the device) and an operational cost (i.e., carbon cost to run the device). The embodied carbon cost is typically high, but it is amortized over the lifetime of the device. In this vision statement, we argue that for carbon metrics to be useful, we need (i) accurate metrics for lifetime, which are challenging for SSDs, and (ii) correct reasoning about carbon costs when using such metrics.
more » « less
Free, publicly-accessible full text available August 26, 2026
AutoBurst: Autoscaling Burstable Instances for Cost-effective Latency SLOs

https://doi.org/10.1145/3698038.3698530

Hasan, Rubaba; Zhu, Timothy; Urgaonkar, Bhuvan (November 2024, ACM)

Full Text Available
Fast and Accurate DNN Performance Estimation across Diverse Hardware Platforms

https://doi.org/10.1109/MASCOTS64422.2024.10786578

Kakrannaya, Vishwas Vasudeva; Rai, Siddhartha Balakrishna; Sivasubramaniam, Anand; Zhu, Timothy (October 2024, IEEE)

Full Text Available
TraceUpscaler: Upscaling Traces to Evaluate Systems at High Load

https://doi.org/10.1145/3627703.3629581

Sajal, Sultan Mahmud; Zhu, Timothy; Urgaonkar, Bhuvan; Sen, Siddhartha (April 2024, ACM)

Full Text Available
SplitRPC: A {Control + Data} Path Splitting RPC Stack for ML Inference Serving

https://doi.org/10.1145/3589974

Kumar, Adithya; Sivasubramaniam, Anand; Zhu, Timothy (May 2023, Proceedings of the ACM Conference on Measurement and Analysis of Computing Systems)

The growing adoption of hardware accelerators driven by their intelligent compiler and runtime system counterparts has democratized ML services and precipitously reduced their execution times. This motivates us to shift our attention to efficiently serve these ML services under distributed settings and characterize the overheads imposed by the RPC mechanism ('RPC tax') when serving them on accelerators. The RPC implementations designed over the years implicitly assume the host CPU services the requests, and we focus on expanding such works towards accelerator-based services. While recent proposals calling for SmartNICs to take on this task are reasonable for simple kernels, serving complex ML models requires a more nuanced view to optimize both the data-path and the control/orchestration of these accelerators. We program today's commodity network interface cards (NICs) to split the control and data paths for effective transfer of control while efficiently transferring the payload to the accelerator. As opposed to unified approaches that bundle these paths together, limiting the flexibility in each of these paths, we design and implement SplitRPC - a control + data path optimizing RPC mechanism for ML inference serving. SplitRPC allows us to optimize the datapath to the accelerator while simultaneously allowing the CPU to maintain full orchestration capabilities. We implement SplitRPC on both commodity NICs and SmartNICs and demonstrate how GPU-based ML services running different compiler/runtime systems can benefit. For a variety of ML models served using different inference runtimes, we demonstrate that SplitRPC is effective in minimizing the RPC tax while providing significant gains in throughput and latency over existing kernel by-pass approaches, without requiring expensive SmartNIC devices.
more » « less
SplitRPC: A {Control + Data} Path Splitting RPC Stack for ML Inference Serving

https://doi.org/10.1145/3578338.3593571

Kumar, Adithya; Sivasubramaniam, Anand; Zhu, Timothy (January 2023, Proceedings of the ACM on measurement and analysis of computing systems)

The growing adoption of hardware accelerators driven by their intelligent compiler and runtime system counterparts has democratized ML services and precipitously reduced their execution times. This motivates us to shift our attention to efficiently serve these ML services under distributed settings and characterize the overheads imposed by the RPC mechanism (‘RPC tax’) when serving them on accelerators. The RPC implementations designed over the years implicitly assume the host CPU services the requests, and we focus on expanding such works towards accelerator-based services. While recent proposals calling for SmartNICs to take on this task are reasonable for simple kernels, serving complex ML models requires a more nuanced view to optimize both the data-path and the control/orchestration of these accelerators. We program today’s commodity network interface cards (NICs) to split the control and data paths for effective transfer of control while efficiently transferring the payload to the accelerator. As opposed to unified approaches that bundle these paths together, limiting the flexibility in each of these paths, we design and implement SplitRPC - a {control + data} path optimizing RPC mechanism for ML inference serving. SplitRPC allows us to optimize the datapath to the accelerator while simultaneously allowing the CPU to maintain full orchestration capabilities. We implement SplitRPC on both commodity NICs and SmartNICs and demonstrate how GPU-based ML services running different compiler/runtime systems can benefit. For a variety of ML models served using different inference runtimes, we demonstrate that SplitRPC is effective in minimizing the RPC tax while providing significant gains in throughput and latency over existing kernel by-pass approaches, without requiring expensive SmartNIC devices.
more » « less
Full Text Available
Kerveros: Efficient and Scalable Cloud Admission Control

Sajal, Sultan Mahmud; Marshall, Luke; Li, Beibin; Zhou, Shandan; Pan, Abhisek; Mellou, Konstantina; Narayanan, Deepak; Zhu, Timothy; Dion, David; Moscibroda, Thomas; et al (July 2023, 17th USENIX Symposium on Operating Systems Design and Implementation)

The infinite capacity of cloud computing is an illusion: in reality, cloud providers cannot always have enough capacity of the right type, in the right place, at the right time to meet all demand. Consequently, cloud providers need to implement admission-control policies to ensure accepted capacity requests experience high availability. However, admission control in the public cloud is hard due to dynamic changes in both supply and demand: hardware might become unavailable, and actual VM consumption could vary for a variety of reasons including tenant scale-outs and fulfillment of VM reservations made by customers ahead of time. In this paper, we design and implement Kerveros, a flexible admission-control system that has three desired properties: i) high computational scalability to handle a large inventory, ii) accurate capacity provisioning for high VM availability, and iii) good packing efficiency to optimize resource usage. To achieve this, Kerveros uses novel bookkeeping techniques to quickly estimate the capacity available for incoming VM requests. Our system has been deployed in Microsoft Azure. Results from both simulations and production confirm that Kerveros achieves more than four nines of availability while sustaining request processing latencies of a few milliseconds.
more » « less
Full Text Available
Overflowing emerging neural network inference tasks from the GPU to the CPU on heterogeneous servers

https://doi.org/10.1145/3534056.3534935

Kumar, Adithya; Sivasubramaniam, Anand; Zhu, Timothy (June 2022, Proceedings of the 15th ACM International Conference on Systems and Storage)

Full Text Available
The Fast and The Frugal: Tail Latency Aware Provisioning for Coping with Load Variations

https://doi.org/10.1145/3366423

Kumar, Adithya; Narayanan, Iyswarya; Zhu, Timothy; Sivasubramaniam, Anand (April 2020, WWW '20: Proceedings of The Web Conference 2020)

Small and medium sized enterprises use the cloud for running online, user-facing, tail latency sensitive applications with well-defined fixed monthly budgets. For these applications, adequate system capacity must be provisioned to extract maximal performance despite the challenges of uncertainties in load and request-sizes. In this paper, we address the problem of capacity provisioning under fixed budget constraints with the goal of minimizing tail latency. To tackle this problem, we propose building systems using a heterogeneous mix of low latency expensive resources and cheap resources that provide high throughput per dollar. As load changes through the day, we use more faster resources to reduce tail latency during low load periods and more cheaper resources to handle the high load periods. To achieve these tail latency benefits, we introduce novel heterogeneity-aware scheduling and autoscaling algorithms that are designed for minimizing tail latency. Using software prototypes and by running experiments on the public cloud, we show that our approach can outperform existing capacity provisioning systems by reducing the tail latency by as much as 45% under fixed-budget settings.
more » « less
Full Text Available
RobinHood: Tail Latency Aware Caching - Dynamic Reallocation from Cache-Rich to Cache-Poor

Berger, Daniel; Berg, Benjamin; Zhu, Timothy; Sen, Siddhartha; Harchol-Balter, Mor (October 2018, 13th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2018)

Full Text Available

Search for: All records