Fast Fine-Grained Global Synchronization on GPUs

Wang, Kai; Fussell, Don; Lin, Calvin

doi:10.1145/3297858.3304055

Citation Details

Fast Fine-Grained Global Synchronization on GPUs

This paper extends the reach of General Purpose GPU programming by presenting a software architecture that supports efficient fine-grained synchronization over global memory. The key idea is to transform global synchronization into global communication so that conflicts are serialized at the thread block level. With this structure, the threads within each thread block can synchronize using low latency, high-bandwidth local scratchpad memory. To enable this architecture, we implement a scalable and efficient message passing library. Using Nvidia GTX 1080 ti GPUs, we evaluate our new software architecture by using it to solve a set of five irregular problems on a variety of workloads. We find that on average, our solutions improve performance over carefully tuned state-of-the-art solutions by 3.6×. more »

Award ID(s):: 1823546

PAR ID:: 10113799

Author(s) / Creator(s):: Wang, Kai; Fussell, Don; Lin, Calvin

Date Published:: 2019-04-13

Journal Name:: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems

Page Range / eLocation ID:: 793 to 806

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3297858.3304055

More Like this