Authors:
;
Award ID(s):
Publication Date:
NSF-PAR ID:
10135902
Journal Name:
34th IEEE International Parallel & Distributed Processing Symposium
4. Abstract Given two datasets (or tables) A and B and a search distance $$\epsilon$$ ϵ , the distance similarity join, denoted as $$A \ltimes _\epsilon B$$ A ⋉ ϵ B , finds the pairs of points ( $$p_a$$ p a , $$p_b$$ p b ), where $$p_a \in A$$ p a ∈ A and $$p_b \in B$$ p b ∈ B , and such that the distance between $$p_a$$ p a and $$p_b$$ p b is $$\le \epsilon$$ ≤ ϵ . If $$A = B$$ A = B , then the similarity join is equivalent to a similarity self-join, denoted as $$A \bowtie _\epsilon A$$ A ⋈ ϵ A . We propose in this paper Heterogeneous Epsilon Grid Joins ( HEGJoin ), a heterogeneous CPU-GPU distance similarity join algorithm. Efficiently partitioning the work between the CPU and the GPU is a challenge. Indeed, the work partitioning strategy needs to consider the different characteristics and computational throughput of the processors (CPU and GPU), as well as the data-dependent nature of the similarity join that accounts in the overall execution time (e.g., the number of queries, their distribution, the dimensionality, etc.). In addition to HEGJoin , we design in this paper amore »