<?xml version="1.0" encoding="UTF-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcq="http://purl.org/dc/terms/"><records count="1" morepages="false" start="1" end="1"><record rownumber="1"><dc:product_type>Journal Article</dc:product_type><dc:title>Scalable KNN Graph Construction for Heterogeneous Architectures</dc:title><dc:creator>Ruys, William [Oden Institute, The University of Texas at Austin, Austin, United States] (ORCID:000000015702022X); Ghafouri, Ali [Oden Institute, The University of Texas at Austin, Austin, United States] (ORCID:0000000333454685); Chen, Chao [Department of Mathematics, North Carolina State University, Raleigh, United States] (ORCID:0000000253853651); Biros, George [Oden Institute, The University of Texas at Austin, Austin, United States] (ORCID:0000000200333994)</dc:creator><dc:corporate_author/><dc:editor/><dc:description>&lt;p&gt;Constructing k-nearest neighbor (kNN) graphs is a fundamental component in many machine learning and scientific computing applications. Despite its prevalence, efficiently building all-nearest-neighbor graphs at scale on distributed heterogeneous HPC systems remains challenging, especially for large sparse non-integer datasets. We introduce optimizations for algorithms based on forests of random projection trees. Our novel GPU kernels for batched, within leaf, exact searches achieve 1.18× speedup over sparse reference kernels with less peak memory, and up to 19× speedup over CPU for memory-intensive problems. Our library,&lt;monospace&gt;PyRKNN&lt;/monospace&gt;, implements distributed randomized projection forests for approximate kNN search. Optimizations to reduce and hide communication overhead allow us to achieve 5× speedup, in per iteration performance, relative to GOFMM (another projection tree, MPI-based kNN library), for a 64M 128d dataset on 1,024 processes. On a single-node we achieve speedup over FAISS-GPU for dense datasets and up to 10× speedup over CPU-only libraries.&lt;monospace&gt;PyRKNN&lt;/monospace&gt;uniquely supports distributed memory kNN graph construction for both dense and sparse coordinates on CPU and GPU accelerators.&lt;/p&gt;</dc:description><dc:publisher>Association for Computing Machinery</dc:publisher><dc:date>2025-09-30</dc:date><dc:nsf_par_id>10642508</dc:nsf_par_id><dc:journal_name>ACM Transactions on Parallel Computing</dc:journal_name><dc:journal_volume>12</dc:journal_volume><dc:journal_issue>3</dc:journal_issue><dc:page_range_or_elocation>1 to 35</dc:page_range_or_elocation><dc:issn>2329-4949</dc:issn><dc:isbn/><dc:doi>https://doi.org/10.1145/3733610</dc:doi><dcq:identifierAwardId>2204226</dcq:identifierAwardId><dc:subject/><dc:version_number/><dc:location/><dc:rights/><dc:institution/><dc:sponsoring_org>National Science Foundation</dc:sponsoring_org></record></records></rdf:RDF>