NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A single-snapshot inverse solver for two-species graph model of tau pathology spreading in human Alzheimer’s disease

https://doi.org/10.1186/s40708-025-00264-z

Wen, Zheyu; Ghafouri, Ali; Biros, George (December 2025, Brain Informatics)

Free, publicly-accessible full text available December 1, 2026
Dynamically Fusing Python HPC Kernels

https://doi.org/10.1145/3728959

Al_Awar, Nader; Naeem, Muhammad Hannan; Almgren-Bell, James; Biros, George; Gligoric, Milos (June 2025, Proceedings of the ACM on Software Engineering)

Recent trends in high-performance computing show an increase in the adoption of performance portable frameworks such as Kokkos and interpreted languages such as Python. PyKokkos follows these trends and enables programmers to write performance-portable kernels in Python which greatly increases productivity. One issue that programmers still face is how to organize parallel code, as splitting code into separate kernels simplifies testing and debugging but may result in suboptimal performance. To enable programmers to organize kernels in any way they prefer while ensuring good performance, we present PyFuser, a program analysis framework for automatic fusion of performance portable PyKokkos kernels. PyFuser dynamically traces kernel calls and lazily fuses them once the result is requested by the application. PyFuser generates fused kernels that execute faster due to better reuse of data, improved compiler optimizations, and reduced kernel launch overhead, while not requiring any changes to existing PyKokkos code. We also introduce automated code transformations that further optimize the fused kernels generated by PyFuser. Our experiments show that on average PyFuser achieves speedups compared to unfused kernels of 3.8x on NVIDIA and AMD GPUs, as well as Intel and AMD CPUs.
more » « less
Free, publicly-accessible full text available June 22, 2026
Speeding up the Local C++ Development Cycle with Header Substitution

https://doi.org/10.1145/3696443.3708942

Al_Awar, Nader; Yi, Zijian; Biros, George; Gligoric, Milos (March 2025, ACM)

Free, publicly-accessible full text available March 1, 2026
Scalable KNN Graph Construction for Heterogeneous Architectures

https://doi.org/10.1145/3733610

Ruys, William; Ghafouri, Ali; Chen, Chao; Biros, George (September 2025, ACM Transactions on Parallel Computing)

Constructing k-nearest neighbor (kNN) graphs is a fundamental component in many machine learning and scientific computing applications. Despite its prevalence, efficiently building all-nearest-neighbor graphs at scale on distributed heterogeneous HPC systems remains challenging, especially for large sparse non-integer datasets. We introduce optimizations for algorithms based on forests of random projection trees. Our novel GPU kernels for batched, within leaf, exact searches achieve 1.18× speedup over sparse reference kernels with less peak memory, and up to 19× speedup over CPU for memory-intensive problems. Our library,PyRKNN, implements distributed randomized projection forests for approximate kNN search. Optimizations to reduce and hide communication overhead allow us to achieve 5× speedup, in per iteration performance, relative to GOFMM (another projection tree, MPI-based kNN library), for a 64M 128d dataset on 1,024 processes. On a single-node we achieve speedup over FAISS-GPU for dense datasets and up to 10× speedup over CPU-only libraries.PyRKNNuniquely supports distributed memory kNN graph construction for both dense and sparse coordinates on CPU and GPU accelerators.
more » « less
Free, publicly-accessible full text available September 30, 2026
KNN-DBSCAN: a DBSCAN in high dimensions

https://doi.org/10.1145/3701624

Chen, Youguang; Ruys, William; Biros, George (March 2025, ACM Transactions on Parallel Computing)

Clustering is a fundamental task in machine learning. One of the most successful and broadly used algorithms is DBSCAN, a density-based clustering algorithm. DBSCAN requires ϵ-nearest neighbor graphs of the input dataset, which are computed with range-search algorithms and spatial data structures like KD-trees. Despite many efforts to design scalable implementations for DBSCAN, existing work is limited to low-dimensional datasets, as constructing ϵ-nearest neighbor graphs can be expensive in high-dimensions. This article introduces a modified DBSCAN, usingk-nearest neighbor (kNN) graphs to improve efficiency. We outline conditions forkNN-DBSCAN to match DBSCAN’s results and present a parallel implementation using OpenMP and MPI for shared and distributed memory systems. Testing on datasets up to 32 dimensions, we achieve remarkable scalability. Our implementation clusters one billion 3D points in under one second on 28K cores at TACC’s Frontera system. In a larger run, we cluster 65 billion points in 20 dimensions in under 40 seconds using 114,688 cores. Our method is up to 37× faster than state-of-the-art parallel DBSCAN on a 20-dimensional dataset with 4 million points. Code is available athttps://github.com/ut-padas/knndbscan.
more » « less
Free, publicly-accessible full text available March 31, 2026
Numerical simulation of an extensible capsule using regularized Stokes kernels and overset finite differences

https://doi.org/10.1016/j.jcp.2024.113042

Agarwal, Dhwanit; Biros, George (July 2024, Journal of Computational Physics)

Full Text Available
GrainGNN: A dynamic graph neural network for predicting 3D grain microstructure

https://doi.org/10.1016/j.jcp.2024.113061

Qin, Yigong; DeWitt, Stephen; Radhakrishnan, Balasubramaniam; Biros, George (August 2024, Journal of Computational Physics)

Full Text Available
An O(N) distributed-memory parallel direct solver for planar integral equations

https://doi.org/10.1109/IPDPS57955.2024.00046

Liang, Tianyu; Chen, Chao; Martinsson, Per-Gunnar; Biros, George (May 2024, IEEE)

Full Text Available
A Scalable Algorithm for Active Learning

https://doi.org/10.1109/SC41406.2024.00057

Chen, Youguang; Wen, Zheyu; Biros, George (November 2024, IEEE)

Free, publicly-accessible full text available November 17, 2025
A GPU Algorithm for Detecting Strongly Connected Components

https://doi.org/10.1145/3581784.3607071

Alabandi, Ghadeer; Sands, William; Biros, George; Burtscher, Martin (November 2023, ACM)

Full Text Available

« Prev Next »

Search for: All records