Advances in virtualization technologies and edge computing have inspired a new paradigm for Internet-of-Things (IoT) application development. By breaking a monolithic application into loosely coupled microservices, great gain can be achieved in performance, flexibility and robustness. In this paper, we study the important problem of load balancing across IoT microservice instances. A key difficulty in this problem is the interdependencies among microservices: the load on a successor microservice instance directly depends on the load distributed from its predecessor microservice instances. We propose a graph-based model for describing the load dependencies among microservices. Based on the model, we first propose a basic formulation for load balancing, which can be solved optimally in polynomial time. The basic model neglects the quality-of-service (QoS) of the IoT application. We then propose a QoS-aware load balancing model, based on a novel abstraction that captures a realization of the application’s internal logic. The QoS-aware load balancing problem is NP-hard. We propose a fully polynomialtime approximation scheme for the QoS-aware problem. We show through simulation experiments that our proposed algorithm achieves enhanced QoS compared to heuristic solutions.
more »
« less
Dynamically Balancing Load with Overload Control for Microservices
The microservices architecture simplifies application development by breaking monolithic applications into manageable microservices. However, this distributed microservice “service mesh” leads to new challenges due to the more complex application topology. Particularly, each service component scales up and down independently creating load imbalance problems on shared backend services accessed by multiple components. Traditional load balancing algorithms do not port over well to a distributed microservice architecture where load balancers are deployed client-side. In this article, we propose a self-managing load balancing system, BLOC, which provides consistent response times to users without using a centralized metadata store or explicit messaging between nodes. BLOC uses overload control approaches to provide feedback to the load balancers. We show that this performs significantly better in solving the incast problem in microservice architectures. A critical component of BLOC is the dynamic capacity estimation algorithm. We show that a well-tuned capacity estimate can outperform even join-the-shortest-queue, a nearly optimal algorithm, while a reasonable dynamic estimate still outperforms Least Connection, a distributed implementation of join-the-shortest-queue. Evaluating this framework, we found that BLOC improves the response time distribution range, between the 10th and 90th percentiles, by 2 –4 times and the tail, 99th percentile, latency by 2 times.
more »
« less
- Award ID(s):
- 1837382
- PAR ID:
- 10588704
- Publisher / Repository:
- ACM
- Date Published:
- Journal Name:
- ACM Transactions on Autonomous and Adaptive Systems
- Volume:
- 19
- Issue:
- 4
- ISSN:
- 1556-4665
- Page Range / eLocation ID:
- 1 to 23
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)Applications in cloud platforms motivate the study of efficient load balancing under job-server constraints and server heterogeneity. In this paper, we study load balancing on a bipartite graph where left nodes correspond to job types and right nodes correspond to servers, with each edge indicating that a job type can be served by a server. Thus edges represent locality constraints, i.e., an arbitrary job can only be served at servers which contain certain data and/or machine learning (ML) models. Servers in this system can have heterogeneous service rates. In this setting, we investigate the performance of two policies named Join-the-Fastest-of-the-Shortest-Queue (JFSQ) and Join-the-Fastest-of-the-Idle-Queue (JFIQ), which are simple variants of Join-the-Shortest-Queue and Join-the-Idle-Queue, where ties are broken in favor of the fastest servers. Under a "well-connected'' graph condition, we show that JFSQ and JFIQ are asymptotically optimal in the mean response time when the number of servers goes to infinity. In addition to asymptotic optimality, we also obtain upper bounds on the mean response time for finite-size systems. We further show that the well-connectedness condition can be satisfied by a random bipartite graph construction with relatively sparse connectivity.more » « less
-
This paper studies how to provision edge computing and network resources for complex microservice-based applications (MSAs) in face of uncertain and dynamic geo-distributed demands. The complex inter-dependencies between distributed microservice components make load balancing for MSAs extremely challenging, and the dynamic geo-distributed demands exacerbate load imbalance and consequently congestion and performance loss. In this paper, we develop an edge resource provisioning model that accurately captures the inter-dependencies between microservices and their impact on load balancing across both computation and communication resources. We also propose a robust formulation that employs explicit risk estimation and optimization to hedge against potential worst-case load fluctuations, with controlled robustness-resource trade-off. Utilizing a data-driven approach, we provide a solution that provides risk estimation with measurement data of past load geo-distributions. Simulations with real-world datasets have validated that our solution provides the important robustness crucially needed in MSAs, and performs superiorly compared to baselines that neglect either network or inter-dependency constraints.more » « less
-
We consider large-scale load balancing systems where processing time distribution of tasks depend on both task and server types. We analyze the system in the asymptotic regime where the number of task and server types tend to infinity proportionally to each other. In such heterogeneous setting, popular policies like Join Fastest Idle Queue (JFIQ), Join Fastest Shortest Queue (JFSQ) are known to perform poorly and they even shrink the stability region. Moreover, to the best of our knowledge, in this setup, finding a scalable policy with provable performance guarantee has been an open question prior to this work. In this paper, we propose and analyze two asymptotically delay-optimal dynamic load balancing approaches: (a) one that efficiently reserves the processing capacity of each server for good tasks and route tasks under the Join Idle Queue policy; and (b) a speed-priority policy that increases the probability of servers processing tasks at a high speed. Introducing a novel analytical framework and using the mean-field method and stochastic coupling arguments, we prove that both policies above achieve asymptotic zero queueing, whereby the probability that a typical task is assigned to an idle server tends to 1 as the system scales.more » « less
-
Optimizing request routing in large microservice-based applications is difficult, especially when applications span multiple geo-distributed clusters. In this paper, inspired by ideas from network traffic engineering, we propose Service Layer Traffic Engineering (SLATE), a new framework for request routing in microservices that span multiple clusters. SLATE leverages global knowledge of cluster states and multi-hop application graphs to centrally control the flow of requests in order to optimize end-to-end application latency and cost. Realizing such a system requires tackling several technical challenges unique to service layer, such as accounting for different request traffic classes, multi-hop call trees, and application latency profiles. We identify such challenges and build a preliminary prototype that addresses some of them. Preliminary evaluations of our prototype show how SLATE outperforms the state-of-the-art global load balancing approach (used by Meta’s Service Router and Google’s Traffic Director) by up to 3.5× in average latency and reduces egress bandwidth cost by up to 11.6×.more » « less
An official website of the United States government

