Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

Wang, Yiqiu; Shrivastava, Anshumali; Wang, Jonathan; Ryu, Junghee

doi:10.1145/3183713.3196925

Citation Details

Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

We present FLASH (F ast L SH A lgorithm for S imilarity search accelerated with H PC), a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations and is tailored for high-performance computing platforms. By leveraging a LSH style randomized indexing procedure and combining it with several principled techniques, such as reservoir sampling, recent advances in one-pass minwise hashing, and count based estimations, we reduce the computational and parallelization costs of similarity search, while retaining sound theoretical guarantees. We evaluate FLASH on several real, high-dimensional datasets from different domains, including text, malicious URL, click-through prediction, social networks, etc. Our experiments shed new light on the difficulties associated with datasets having several million dimensions. Current state-of-the-art implementations either fail on the presented scale or are orders of magnitude slower than FLASH. FLASH is capable of computing an approximate k-NN graph, from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than 10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam dataset, using brute-force (n2D), will require at least 20 teraflops. We provide CPU and GPU implementations of FLASH for replicability of our results. more »

Award ID(s):: 1652131 1718478

PAR ID:: 10065988

Author(s) / Creator(s):: Wang, Yiqiu; Shrivastava, Anshumali; Wang, Jonathan; Ryu, Junghee

Date Published:: 2018-01-01

Journal Name:: Proceedings of the 2018 International Conference on Management of Data

Page Range / eLocation ID:: 889 to 903

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript1.0
Conference Paper:
https://doi.org/10.1145/3183713.3196925

More Like this