skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Distributed Speed Scaling in Large-Scale Service Systems
We consider a large-scale parallel-server loss system with an unknown arrival rate, where each server is able to adjust its processing speed. The objective is to minimize the system cost, which consists of a power cost to maintain the servers' processing speeds and a quality of service cost depending on the tasks' processing times, among others. We draw on ideas from stochastic approximation to design a novel speed scaling algorithm and prove that the servers' processing speeds converge to the globally asymptotically optimum value. Curiously, the algorithm is fully distributed and does not require any communication between servers. Apart from the algorithm design, a key contribution of our approach lies in demonstrating how concepts from the stochastic approximation literature can be leveraged to effectively tackle learning problems in large-scale, distributed systems. En route, we also analyze the performance of a fully heterogeneous parallel-server loss system, where each server has a distinct processing speed, which might be of independent interest.  more » « less
Award ID(s):
2240982 2113027
PAR ID:
10522086
Author(s) / Creator(s):
; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
ACM SIGMETRICS Performance Evaluation Review
Volume:
52
Issue:
1
ISSN:
0163-5999
Page Range / eLocation ID:
95 to 96
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Smart Servers, Smarter Speed Scaling: A Decentralized Algorithm for Data Center Efficiency A team of researchers from Georgia Tech and the University of Minnesota has introduced a cutting-edge algorithm designed to optimize energy use in large-scale data centers. As detailed in their paper “Distributed Rate Scaling in Large-Scale Service Systems,” the team developed a decentralized method allowing each server to adjust its processing speed autonomously without the need for communication or knowledge of system-wide traffic. The algorithm uses idle time as a local signal to guide processing speed, ensuring that all servers converge toward a globally optimal performance rate. This innovation addresses a critical issue in modern computing infrastructure: balancing energy efficiency with performance under uncertainty and scale. The authors demonstrate that their approach not only stabilizes the system but achieves asymptotic optimality as the number of servers increases. The work is poised to significantly reduce energy consumption in data centers, which are projected to account for up to 8% of U.S. electricity use by 2030. 
    more » « less
  2. We consider large-scale load balancing systems where processing time distribution of tasks depend on both task and server types. We analyze the system in the asymptotic regime where the number of task and server types tend to infinity proportionally to each other. In such heterogeneous setting, popular policies like Join Fastest Idle Queue (JFIQ), Join Fastest Shortest Queue (JFSQ) are known to perform poorly and they even shrink the stability region. Moreover, to the best of our knowledge, in this setup, finding a scalable policy with provable performance guarantee has been an open question prior to this work. In this paper, we propose and analyze two asymptotically delay-optimal dynamic load balancing approaches: (a) one that efficiently reserves the processing capacity of each server for good tasks and route tasks under the Join Idle Queue policy; and (b) a speed-priority policy that increases the probability of servers processing tasks at a high speed. Introducing a novel analytical framework and using the mean-field method and stochastic coupling arguments, we prove that both policies above achieve asymptotic zero queueing, whereby the probability that a typical task is assigned to an idle server tends to 1 as the system scales. 
    more » « less
  3. Consider a system with N identical single-server queues and a number of task types, where each server is able to process only a small subset of possible task types. Arriving tasks select [Formula: see text] random compatible servers and join the shortest queue among them. The compatibility constraints are captured by a fixed bipartite graph between the servers and the task types. When the graph is complete bipartite, the mean-field approximation is accurate. However, such dense compatibility graphs are infeasible for large-scale implementation. We characterize a class of sparse compatibility graphs for which the mean-field approximation remains valid. For this, we introduce a novel notion, called proportional sparsity, and establish that systems with proportionally sparse compatibility graphs asymptotically match the performance of a fully flexible system. Furthermore, we show that proportionally sparse random compatibility graphs can be constructed, which reduce the server degree almost by a factor [Formula: see text] compared with the complete bipartite compatibility graph. 
    more » « less
  4. Gibbons, P; Pekhimenko, G; De_Sa, C (Ed.)
    Federated Learning (FL) typically involves a large-scale, distributed system with individual user devices/servers training models locally and then aggregating their model updates on a trusted central server. Existing systems for FL often use an always-on server for model aggregation, which can be inefficient in terms of resource utilization. They also may be inelastic in their resource management. This is particularly exacerbated when aggregating model updates at scale in a highly dynamic environment with varying numbers of heterogeneous user devices/servers. We present LIFL, a lightweight and elastic serverless cloud platform with fine-grained resource management for efficient FL aggregation at scale. LIFL is enhanced by a streamlined, event-driven serverless design that eliminates the individual, heavyweight message broker and replaces inefficient container-based sidecars with lightweight eBPF-based proxies. We leverage shared memory processing to achieve high-performance communication for hierarchical aggregation, which is commonly adopted to speed up FL aggregation at scale. We further introduce the locality-aware placement in LIFL to maximize the benefits of shared memory processing. LIFL precisely scales and carefully reuses the resources for hierarchical aggregation to achieve the highest degree of parallelism, while minimizing aggregation time and resource consumption. Our preliminary experimental results show that LIFL achieves significant improvement in resource efficiency and aggregation speed for supporting FL at scale, compared to existing serverful and serverless FL systems. 
    more » « less
  5. Federated Learning (FL) typically involves a large-scale, distributed system with individual user devices/servers training models locally and then aggregating their model updates on a trusted central server. Existing systems for FL often use an always-on server for model aggregation, which can be inefficient in terms of resource utilization. They also may be inelastic in their resource management. This is particularly exacerbated when aggregating model updates at scale in a highly dynamic environment with varying numbers of heterogeneous user devices/servers. We present LIFL, a lightweight and elastic serverless cloud platform with fine-grained resource management for efficient FL aggregation at scale. LIFL is enhanced by a streamlined, event-driven serverless design that eliminates the individual, heavyweight message broker and replaces inefficient container-based sidecars with lightweight eBPF-based proxies. We leverage shared memory processing to achieve high-performance communication for hierarchical aggregation, which is commonly adopted to speed up FL aggregation at scale. We further introduce the locality-aware placement in LIFL to maximize the benefits of shared memory processing. LIFL precisely scales and carefully reuses the resources for hierarchical aggregation to achieve the highest degree of parallelism, while minimizing aggregation time and resource consumption. Our preliminary experimental results show that LIFL achieves significant improvement in resource efficiency and aggregation speed for supporting FL at scale, compared to existing serverful and serverless FL systems. 
    more » « less