Fast and Space-Efficient Parallel Algorithms for Influence Maximization

Wang, Letong; Ding, Xiangyun; Gu, Yan; Sun, Yihan

doi:10.14778/3632093.3632104

Influence Maximization (IM) is a crucial problem in data science. The goal is to find a fixed-size set of highly influentialseedvertices on a network to maximize the influence spread along the edges. While IM is NP-hard on commonly used diffusion models, a greedy algorithm can achieve (1 - 1/e)-approximation by repeatedly selecting the vertex with the highestmarginal gainin influence as the seed. However, we observe two performance issues in the existing work that prevent them from scaling to today's large-scale graphs: space-inefficient memorization to estimate marginal gain, and time-inefficient seed selection process due to a lack of parallelism. This paper significantly improves the scalability of IM using two key techniques. The first is asketch-compressiontechnique for the independent cascading model on undirected graphs. It allows combining the simulation and sketching approaches to achieve a time-space tradeoff. The second technique includes new data structures for parallel seed selection. Using our new approaches, we implementedPaC-IM: Parallel and Compressed IM. We comparePaC-IMwith state-of-the-art parallel IM systems on a 96-core machine with 1.5TB memory.PaC-IMcan process the ClueWeb graph with 978M vertices and 75B edges in about 2 hours. On average, across all tested graphs, our uncompressed version is 5--18x faster and about 1.4x more space-efficient than existing parallel IM systems. Using compression further saves 3.8x space with only 70% overhead in time on average.

More Like this