Adaptive Task-Oriented Resource Allocation for Large Dynamic Workflows on Opportunistic Resources

Phung, Thanh Son; Thain, Douglas

doi:10.1109/IPDPS57955.2024.00034

Dynamic workflow management systems offer a solution to the problem of distributing a local application by packaging individual computations and their dependencies on- the-fly into tasks executable on remote workers. Such inde- pendent task execution allows workers to be launched in an opportunistic manner to maximize the current pool of resources at any given time, either through opportunistic systems (e.g., HTCondor, AWS Spot Instances), or conventional systems (e.g., SLURM, SGE) with backfilling enabled, as opposed to monolithic or message-passing applications requiring a fixed block of non- preemptible workers. However, the dynamic nature of task generation presents a significant challenge in terms of resource management as tasks must be allocated with some unknown amount of resources pre-execution but are only observable at runtime. This in turn results in potentially huge resource waste per task as (1) users lack direct knowledge about the relationship between tasks and resources, and thus cannot correctly specify the amount of resources a task needs in advance, and (2) workflows and tasks may exhibit stochastic behaviors at runtime, which complicates the process of resource management. In this paper, we (1) argue for the need of an adaptive resource allocator capable of allocating tasks at runtime and adjusting to random fluctuations and abrupt changes in a dynamic workflow without requiring any prior knowledge, and (2) introduce Greedy Bucketing and Exhaustive Bucketing: two robust, online, general- purpose, and prior-free allocation algorithms capable of producing quality estimates of a task’s resource consumption as the work- flow runs. Our results show that a resource allocator equipped with either algorithm consistently outperforms 5 alternative allocation algorithms on 7 diverse workflows and incurs at most 1.6 ms overhead per allocation in the steady state.

More Like this