skip to main content


Title: Multi‐resource allocation in cloud data centers: A trade‐off on fairness and efficiency
Summary

Fair allocation has been studied intensively in both economics and computer science. Many existing mechanisms that consider fairness of resource allocation focus on a single resource. With the advance of cloud computing that centralizes multiple types of resources under one shared platform, multi‐resource allocation has come into the spotlight. In fact, fair/efficient multi‐resource allocation has become a fundamental problem in any shared computer system. The widely used solution is to partition resources into bundles that contain fixed amounts of different resources, so that multiple resources are abstracted as a single resource. However, this abstraction cannot satisfy different demands from heterogeneous users, especially on ensuring fairness among users competing for resources with different capacity limits. A promising approach to this problem is dominant resource fairness (DRF), which tries to equalize each user's dominant share (share of a user's most highly demanded resource, that is, the largest fraction of any resource that the user has required for a task), but this method may still suffer from significant loss of efficiency (i.e., some resources are underused). This article develops a new allocation mechanism based on DRF aiming to balance fairness and efficiency. We consider fairness not only in terms of a user's dominant resource, but also in another resource dimension which is secondarily desired by this user. We call this allocation mechanism 2‐dominant resource fairness (2‐DF). Then, we design a non‐trivial on‐line algorithm to find a 2‐DF allocation and extend this concept tok‐dominant resource fairness (k‐DF).

 
more » « less
PAR ID:
10453661
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Wiley Blackwell (John Wiley & Sons)
Date Published:
Journal Name:
Concurrency and Computation: Practice and Experience
Volume:
33
Issue:
6
ISSN:
1532-0626
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Apache Mesos, a two-level resource scheduler, provides resource sharing across multiple users in a multi-tenant clustered environment. Computational resources (i.e., CPU, memory, disk, etc.) are distributed according to the Dominant Resource Fairness (DRF) policy. Mesos frameworks (users) receive resources based on their current usage and are responsible for scheduling their tasks within the allocation. We have observed that multiple frameworks can cause fairness imbalance in a multi-user environment. For example, a greedy framework consuming more than its fair share of resources can deny resource fairness to others. The user with the least Dominant Share is considered first by the DRF module to get its resource allocation. However, the default DRF implementation, in Apache Mesos' Master allocation module, does not consider the overall resource demands of the tasks in the queue for each user/framework. This lack of awareness can lead to poor performance as users without any pending task may receive more resource offers, and users with a queue of pending tasks can starve due to their high dominant shares. In a multi-tenant environment, the characteristics of frameworks and workloads must be understood by cluster managers to be able to define fairness based on not only resource share but also resource demand and queue wait time. We have developed a policy driven queue manager, Tromino, for an Apache Mesos cluster where tasks for individual frameworks can be scheduled based on each framework's overall resource demands and current resource consumption. Dominant Share and demand awareness of Tromino and scheduling based on these attributes can reduce (1) the impact of unfairness due to a framework specific configuration, and (2) unfair waiting time due to higher resource demand in a pending task queue. In the best case, Tromino can significantly reduce the average waiting time of a framework by using the proposed Demand-DRF aware policy. 
    more » « less
  2. Apache Mesos, a cluster-wide resource manager, is widely deployed in massive scale at several Clouds and Data Centers. Mesos aims to provide high cluster utilization via fine grained resource co-scheduling and resource fairness among multiple users through Dominant Resource Fairness (DRF) based allocation. DRF takes into account different resource types (CPU, Memory, Disk I/O) requested by each application and determines the share of each cluster resource that could be allocated to the applications. Mesos has adopted a two-level scheduling policy: (1) DRF to allocate resources to competing frameworks and (2) task level scheduling by each framework for the resources allocated during the previous step. We have conducted experiments in a local Mesos cluster when used with frameworks such as Apache Aurora, Marathon, and our own framework Scylla, to study resource fairness and cluster utilization. Experimental results show how informed decision regarding second level scheduling policy of frameworks and attributes like offer holding period, offer refusal cycle and task arrival rate can reduce unfair resource distribution. Bin-Packing scheduling policy on Scylla with Marathon can reduce unfair allocation from 38% to 3%. By reducing unused free resources in offers we bring down the unfairness from to 90% to 28%. We also show the effect of task arrival rate to reduce the unfairness from 23% to 7%. 
    more » « less
  3. In order to meet the performance/privacy require- ments of future data-intensive mobile applications, e.g., self- driving cars, mobile data analytics, and AR/VR, service providers are expected to draw on shared storage/computation/connectivity resources at the network “edge”. To be cost-effective, a key functional requirement for such infrastructure is enabling the shar- ing of heterogeneous resources amongst tenants/service providers supporting spatially varying and dynamic user demands. This paper proposes a resource allocation criterion, namely, Share Constrained Slicing (SCS), for slices allocated predefined shares of the network’s resources, which extends traditional α−fairness criterion, by striking a balance among inter- and intra-slice fairness vs. overall efficiency. We show that SCS has several desirable properties including slice-level protection, envyfreeness, and load- driven elasticity. In practice, mobile users’ dynamics could make the cost of implementing SCS high, so we discuss the feasibility of using a simpler (dynamically) weighted max-min as a surrogate resource allocation scheme. For a setting with stochastic loads and elastic user requirements, we establish a sufficient condition for the stability of the associated coupled network system. Finally, and perhaps surprisingly, we show via extensive simulations that while SCS (and/or the surrogate weighted max-min allocation) provides inter-slice protection, they can achieve improved job delay and/or perceived throughput, as compared to other weighted max- min based allocation schemes whose intra-slice weight allocation is not share-constrained, e.g., traditional max-min or discriminatory processor sharing. 
    more » « less
  4. A single homogeneous resource needs to be fairly shared between users that dynamically arrive and depart over time. Although good allocations exist for any fixed number of users, implementing these allocations dynamically is impractical: it typically entails adjustments in the allocation of every user in the system whenever a new user arrives. We introduce a dynamic fair resource division problem in which there is a limit on the number of users that can be disrupted when a new user arrives and study the trade-off between fairness and the number of allowed disruptions, using a fairness metric: the fairness ratio. We almost completely characterize this trade-off and give an algorithm for obtaining the optimal fairness for any number of allowed disruptions. 
    more » « less
  5. We consider the problem of dividing limited resources to individuals arriving over T rounds. Each round has a random number of individuals arrive, and individuals can be characterized by their type (i.e., preferences over the different resources). A standard notion of fairness in this setting is that an allocation simultaneously satisfy envy-freeness and efficiency. The former is an individual guarantee, requiring that each agent prefers the agent’s own allocation over the allocation of any other; in contrast, efficiency is a global property, requiring that the allocations clear the available resources. For divisible resources, when the number of individuals of each type are known up front, the desiderata are simultaneously achievable for a large class of utility functions. However, in an online setting when the number of individuals of each type are only revealed round by round, no policy can guarantee these desiderata simultaneously, and hence, the best one can do is to try and allocate so as to approximately satisfy the two properties. We show that, in the online setting, the two desired properties (envy-freeness and efficiency) are in direct contention in that any algorithm achieving additive counterfactual envy-freeness up to a factor of L T necessarily suffers an efficiency loss of at least [Formula: see text]. We complement this uncertainty principle with a simple algorithm, Guarded-Hope, which allocates resources based on an adaptive threshold policy and is able to achieve any fairness–efficiency point on this frontier. Our results provide guarantees for fair online resource allocation with high probability for multiple resource and multiple type settings. In simulation results, our algorithm provides allocations close to the optimal fair solution in hindsight, motivating its use in practical applications as the algorithm is able to adapt to any desired fairness efficiency trade-off. Funding: This work was supported by the National Science Foundation [Grants ECCS-1847393, DMS-1839346, CCF-1948256, and CNS-1955997] and the Army Research Laboratory [Grant W911NF-17-1-0094]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/opre.2022.2397 . 
    more » « less