NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

High Performance Evaluation of Helmholtz Potentials Using the Multi-Level Fast Multipole Algorithm

https://doi.org/10.1109/TPDS.2022.3165649

Lingg, Michael P.; Hughey, Stephen M.; Shanker, Balasubramaniam; Aktulga, Hasan Metin (December 2022, IEEE Transactions on Parallel and Distributed Systems)

Full Text Available
Decoupled Potential Integral Equation for Electromagnetic Scattering From Arbitrarily Shaped Dielectric Objects

https://doi.org/10.1109/TAP.2022.3161278

Baumann, Luke; Aktulga, H. M.; Macon, Charles A.; Shanker, B. (August 2022, IEEE Transactions on Antennas and Propagation)

Full Text Available
Optimizing Data Locality and Termination Criterion for t-SNE

https://doi.org/10.1109/IJCNN52387.2021.9534303

Dikbayir, Doga; Shanker, Balasubramaniam; Aktulga, Hasan Metin (July 2021, 2021 International Joint Conference on Neural Networks (IJCNN))

The t-Distributed Stochastic Neighbor Embedding (t-SNE) is known to be a successful method at visualizing high-dimensional data, making it very popular in the machine-learning and data analysis community, especially recently. However, there are two glaring unaddressed problems: (a) Existing GPU accelerated implementations of t-SNE do not account for the poor data locality present in the computation. This results in sparse matrix computations being a bottleneck during execution, especially for large data sets. (b) Another problem is the lack of an effective stopping criterion in the literature. In this paper, we report an improved GPU implementation that uses sparse matrix re-ordering to improve t-SNE's memory access pattern and a novel termination criterion that is better suited for visualization purposes. The proposed methods result in up to 4.63 x end-to-end speedup and provide a practical stopping metric, potentially preventing the algorithm from terminating prematurely or running for an excessive amount of iterations. These developments enable high-quality visualizations and accurate analyses of complex large data sets containing up to 10 million data points and requiring thousands of iterations for convergence.
more » « less
Full Text Available
Exploring Task Parallelism for the Multilevel Fast Multipole Algorithm

https://doi.org/10.1109/HiPC50609.2020.00018

Lingg, Michael P.; Hughey, Stephen M.; Dikbayir, Doga; Shanker, Balasubramaniam; Aktulga, Hasan Metin (December 2020, 2020 IEEE 27th International Conference on High Performance Computing, Data, and Analytics (HiPC))

The Multi-Level Fast Multipole Algorithm (MLFMA), a variant of the fast multiple method (FMM) for problems with oscillatory potentials, significantly accelerates the solution of problems based on wave physics, such as those in electromagnetics and acoustics. Existing shared memory parallel approaches for MLFMA have adopted the bulk synchronous parallel (BSP) model. While the BSP approach has served well so far, it is prone to significant thread synchronization overheads, but more importantly fails to leverage the communication/computation overlap opportunities due to complicated data dependencies in MLFMA. In this paper, we develop a task parallel MLFMA implementation for shared memory architectures, and discuss optimizations to improve its performance. We then evaluate the new task parallel MLFMA implementation against a BSP implementation for a number of geometries. Our findings suggest that task parallelism is generally superior to the BSP model, and considering its potential advantages over the BSP model in a hybrid parallel setting, we see it to be a promising approach in addressing the scalability issues of MLFMA in large scale computations.
more » « less
Full Text Available
Fast and scalable evaluation of pairwise potentials

https://doi.org/10.1016/j.cpc.2020.107248

Hughey, S.; Alsnayyan, A.; Aktulga, H.M.; Gao, T.; Shanker, B. (October 2020, Computer Physics Communications)

Full Text Available
Parallel Wideband MLFMA for Analysis of Electrically Large, Nonuniform, Multiscale Structures

https://doi.org/10.1109/TAP.2018.2882621

Hughey, Stephen; Aktulga, H. M.; Vikram, Melapudi; Lu, Mingyu; Shanker, Balasubramaniam; Michielssen, Eric (February 2019, IEEE Transactions on Antennas and Propagation)

Full Text Available

Search for: All records