Optimizing Word2Vec Performance on Multicore Systems

Rengasamy, Vasudevan; Fu, Tao-Yang; Lee, Wang-Chien; Madduri, Kamesh

doi:10.1145/3149704.3149768

Citation Details

Optimizing Word2Vec Performance on Multicore Systems

The Skip-gram with negative sampling (SGNS) method of Word2Vec is an unsupervised approach to map words in a text corpus to low dimensional real vectors. The learned vectors capture semantic relationships between co-occurring words and can be used as inputs to many natural language processing and machine learning tasks. There are several high-performance implementations of the Word2Vec SGNS method. In this paper, we introduce a new optimization called context combining to further boost SGNS performance on multicore systems. For processing the One Billion Word benchmark dataset on a 16-core platform, we show that our approach is 3.53x faster than the original multithreaded Word2Vec implementation and 1.28x faster than a recent parallel Word2Vec implementation. We also show that our accuracy on benchmark queries is comparable to state-of-the-art implementations. more »

Award ID(s):: 1717084

PAR ID:: 10065398

Author(s) / Creator(s):: Rengasamy, Vasudevan; Fu, Tao-Yang; Lee, Wang-Chien; Madduri, Kamesh

Date Published:: 2017-11-12

Journal Name:: Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms

Page Range / eLocation ID:: 1 to 9

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3149704.3149768

More Like this