NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Many-Server Asymptotics for Join-the-Shortest-Queue in the Super-Halfin-Whitt Scaling Window

https://doi.org/10.1287/moor.2021.0133

Zhao, Zhisheng; Banerjee, Sayan; Mukherjee, Debankur (July 2025, Mathematics of Operations Research)

Join-the-shortest queue (JSQ) is a classical benchmark for the performance of parallel-server queueing systems because of its strong optimality properties. Recently, there has been significant progress in understanding its large-system asymptotic behavior. In this paper, we analyze the JSQ policy in the super-Halfin-Whitt scaling window when load per server [Formula: see text] scales with the system size N as [Formula: see text] for [Formula: see text] and [Formula: see text]. We establish that the centered and scaled total queue length process converges to a certain Bessel process with negative drift, and the associated (centered and scaled) steady-state total queue length, indexed by N, converges to a [Formula: see text] distribution. The limit laws are universal in the sense that they do not depend on the value of [Formula: see text] and exhibit fundamentally different behavior from both the Halfin–Whitt regime ([Formula: see text]) and the nondegenerate slowdown (NDS) regime ([Formula: see text]). Funding: This work was supported by the National Science Foundation to S. Banerjee [Grants CAREER DMS-2141621 and RTG DMS-2134107] and D. Mukherjee and Z. Zhao [Grants CIF-2113027 and CPS-2240982].
more » « less
Free, publicly-accessible full text available July 8, 2026
Distributed Speed Scaling in Large-Scale Service Systems

https://doi.org/10.1287/opre.2024.1012

Rutten, Daan; Zubeldia, Martin; Mukherjee, Debankur (June 2025, Operations Research)

Smart Servers, Smarter Speed Scaling: A Decentralized Algorithm for Data Center Efficiency A team of researchers from Georgia Tech and the University of Minnesota has introduced a cutting-edge algorithm designed to optimize energy use in large-scale data centers. As detailed in their paper “Distributed Rate Scaling in Large-Scale Service Systems,” the team developed a decentralized method allowing each server to adjust its processing speed autonomously without the need for communication or knowledge of system-wide traffic. The algorithm uses idle time as a local signal to guide processing speed, ensuring that all servers converge toward a globally optimal performance rate. This innovation addresses a critical issue in modern computing infrastructure: balancing energy efficiency with performance under uncertainty and scale. The authors demonstrate that their approach not only stabilizes the system but achieves asymptotic optimality as the number of servers increases. The work is poised to significantly reduce energy consumption in data centers, which are projected to account for up to 8% of U.S. electricity use by 2030.
more » « less
Free, publicly-accessible full text available June 3, 2026
Distributed Speed Scaling in Large-Scale Service Systems

https://doi.org/10.1145/3673660.3655053

Rutten, Daan; Zubeldia, Martin; Mukherjee, Debankur (June 2024, ACM SIGMETRICS Performance Evaluation Review)

We consider a large-scale parallel-server loss system with an unknown arrival rate, where each server is able to adjust its processing speed. The objective is to minimize the system cost, which consists of a power cost to maintain the servers' processing speeds and a quality of service cost depending on the tasks' processing times, among others. We draw on ideas from stochastic approximation to design a novel speed scaling algorithm and prove that the servers' processing speeds converge to the globally asymptotically optimum value. Curiously, the algorithm is fully distributed and does not require any communication between servers. Apart from the algorithm design, a key contribution of our approach lies in demonstrating how concepts from the stochastic approximation literature can be leveraged to effectively tackle learning problems in large-scale, distributed systems. En route, we also analyze the performance of a fully heterogeneous parallel-server loss system, where each server has a distinct processing speed, which might be of independent interest.
more » « less
Full Text Available
Optimal Rate-Matrix Pruning For Heterogeneous Systems

https://doi.org/10.1145/3649477.3649492

Zhao, Zhisheng; Mukherjee, Debankur (February 2024, ACM SIGMETRICS Performance Evaluation Review)

We consider large-scale load balancing systems where processing time distribution of tasks depend on both task and server types. We analyze the system in the asymptotic regime where the number of task and server types tend to infinity proportionally to each other. In such heterogeneous setting, popular policies like Join Fastest Idle Queue (JFIQ), Join Fastest Shortest Queue (JFSQ) are known to perform poorly and they even shrink the stability region. Moreover, to the best of our knowledge, in this setup, finding a scalable policy with provable performance guarantee has been an open question prior to this work. In this paper, we propose and analyze two asymptotically delay-optimal dynamic load balancing approaches: (a) one that efficiently reserves the processing capacity of each server for good tasks and route tasks under the Join Idle Queue policy; and (b) a speed-priority policy that increases the probability of servers processing tasks at a high speed. Introducing a novel analytical framework and using the mean-field method and stochastic coupling arguments, we prove that both policies above achieve asymptotic zero queueing, whereby the probability that a typical task is assigned to an idle server tends to 1 as the system scales.
more » « less
Full Text Available
Exploiting Data Locality to Improve Performance of Heterogeneous Server Clusters

https://doi.org/10.1287/stsy.2022.0040

Zhao, Zhisheng; Mukherjee, Debankur; Wu, Ruoyu (February 2024, Stochastic Systems)

We consider load balancing in large-scale heterogeneous server systems in the presence of data locality that imposes constraints on which tasks can be assigned to which servers. The constraints are naturally captured by a bipartite graph between the servers and the dispatchers handling assignments of various arrival flows. When a task arrives, the corresponding dispatcher assigns it to a server with the shortest queue among [Formula: see text] randomly selected servers obeying these constraints. Server processing speeds are heterogeneous, and they depend on the server type. For a broad class of bipartite graphs, we characterize the limit of the appropriately scaled occupancy process, both on the process level and in steady state, as the system size becomes large. Using such a characterization, we show that imposing data locality constraints can significantly improve the performance of heterogeneous systems. This is in stark contrast to either heterogeneous servers in a full flexible system or data locality constraints in systems with homogeneous servers, both of which have been observed to degrade the system performance. Extensive numerical experiments corroborate the theoretical results. Funding: This work was partially supported by the National Science Foundation [CCF. 07/2021–06/2024].
more » « less
Full Text Available
Smoothed Online Optimization with Unreliable Predictions

https://doi.org/10.1145/3578338.3593570

Rutten, Daan; Christianson, Nicolas; Mukherjee, Debankur; Wierman, Adam (June 2023, Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems)

Full Text Available
A New Approach to Capacity Scaling Augmented with Unreliable Machine Learning Predictions

https://doi.org/10.1287/moor.2023.1364

Rutten, Daan; Mukherjee, Debankur (January 2023, Mathematics of Operations Research)

Modern data centers suffer from immense power consumption. As a result, data center operators have heavily invested in capacity-scaling solutions, which dynamically deactivate servers if the demand is low and activate them again when the workload increases. We analyze a continuous-time model for capacity scaling, where the goal is to minimize the weighted sum of flow time, switching cost, and power consumption in an online fashion. We propose a novel algorithm, called adaptive balanced capacity scaling (ABCS), that has access to black-box machine learning predictions. ABCS aims to adapt to the predictions and is also robust against unpredictable surges in the workload. In particular, we prove that ABCS is [Formula: see text] competitive if the predictions are accurate, and yet, it has a uniformly bounded competitive ratio even if the predictions are completely inaccurate. Finally, we investigate the performance of this algorithm on a real-world data set and carry out extensive numerical experiments, which positively support the theoretical results. Funding: This work was partially supported by the Division of Computing and Communication Foundations [Grant 2113027]. The authors also acknowledge financial support for this project from the Algorithm and Randomness Center–Transdisciplinary Research Institute for Advancing Data Science Fellowship at Georgia Tech.
more » « less
Full Text Available
Mean-field Analysis for Load Balancing on Spatial Graphs

https://doi.org/10.1145/3578338.3593552

Rutten, Daan; Mukherjee, Debankur (January 2023, Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems)

Full Text Available
Scalable Load Balancing in Networked Systems: A Survey of Recent Advances

https://doi.org/10.1137/20M1323746

der Boor, Mark Van; Borst, Sem C.; Van Leeuwaarden, Johan S.; Mukherjee, Debankur (August 2022, SIAM Review)

Full Text Available
Load Balancing Under Strict Compatibility Constraints

https://doi.org/10.1287/moor.2022.1258

Rutten, Daan; Mukherjee, Debankur (April 2022, Mathematics of Operations Research)

Consider a system with N identical single-server queues and a number of task types, where each server is able to process only a small subset of possible task types. Arriving tasks select [Formula: see text] random compatible servers and join the shortest queue among them. The compatibility constraints are captured by a fixed bipartite graph between the servers and the task types. When the graph is complete bipartite, the mean-field approximation is accurate. However, such dense compatibility graphs are infeasible for large-scale implementation. We characterize a class of sparse compatibility graphs for which the mean-field approximation remains valid. For this, we introduce a novel notion, called proportional sparsity, and establish that systems with proportionally sparse compatibility graphs asymptotically match the performance of a fully flexible system. Furthermore, we show that proportionally sparse random compatibility graphs can be constructed, which reduce the server degree almost by a factor [Formula: see text] compared with the complete bipartite compatibility graph.
more » « less
Full Text Available

« Prev Next »

Search for: All records