Journal ArticleScalable KNN Graph Construction for Heterogeneous ArchitecturesRuys, William [Oden Institute, The University of Texas at Austin, Austin, United States] (ORCID:000000015702022X); Ghafouri, Ali [Oden Institute, The University of Texas at Austin, Austin, United States] (ORCID:0000000333454685); Chen, Chao [Department of Mathematics, North Carolina State University, Raleigh, United States] (ORCID:0000000253853651); Biros, George [Oden Institute, The University of Texas at Austin, Austin, United States] (ORCID:0000000200333994)<p>Constructing k-nearest neighbor (kNN) graphs is a fundamental component in many machine learning and scientific computing applications. Despite its prevalence, efficiently building all-nearest-neighbor graphs at scale on distributed heterogeneous HPC systems remains challenging, especially for large sparse non-integer datasets. We introduce optimizations for algorithms based on forests of random projection trees. Our novel GPU kernels for batched, within leaf, exact searches achieve 1.18× speedup over sparse reference kernels with less peak memory, and up to 19× speedup over CPU for memory-intensive problems. Our library,<monospace>PyRKNN</monospace>, implements distributed randomized projection forests for approximate kNN search. Optimizations to reduce and hide communication overhead allow us to achieve 5× speedup, in per iteration performance, relative to GOFMM (another projection tree, MPI-based kNN library), for a 64M 128d dataset on 1,024 processes. On a single-node we achieve speedup over FAISS-GPU for dense datasets and up to 10× speedup over CPU-only libraries.<monospace>PyRKNN</monospace>uniquely supports distributed memory kNN graph construction for both dense and sparse coordinates on CPU and GPU accelerators.</p>Association for Computing Machinery2025-09-3010642508ACM Transactions on Parallel Computing1231 to 352329-4949https://doi.org/10.1145/37336102204226National Science Foundation