Optimizing Work Stealing Communication with Structured Atomic Operations

Cartier, H.; Dinan, J.; Larkins, D. B.

doi:10.1145/3472456.3472522

Applications that rely on sparse or irregular data are often challenging to scale on modern distributed-memory systems. As a result, these systems typically require continuous load balancing in order to maintain efficiency. Work stealing is a common technique to remedy imbalance. In this work we present a strategy for work stealing that reduces the amount of communication required for a steal operation by half. We show that in exchange for a small amount of additional complexity to manage the local queue state we can combine both discovering and claiming work into a single step. Conventionally, workstealing uses a two step process of discovering work and then claiming it. Our system, SWS, provides a mechanism where both processes are performed in a singular communication without the need for multiple synchronization messages. This reduction in communication is possible with the novel application of atomic operations that manipulate a compact representation of task queue metadata. We demonstrate the effectiveness of this strategy using known benchmarks for testing dynamic load balancing systems and for performing unbalanced tree searches. Our results show the reduction in communication reduces task acquisition time and steal time, which in turn improves overall performance on sparse computations.

More Like this