NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

HHOTuner: Efficient Performance Tuning with Harris Hawks Optimization

Dutta, Akash; Jannesari, Ali (September 2025, Proceedings of the International Conference on Parallel Processing (ICPP))

Free, publicly-accessible full text available September 9, 2026
PCEBench: A Multi-Dimensional Benchmark for Evaluating Large Language Models in Parallel Code Generation

https://doi.org/10.1109/IPDPS64566.2025.00055

Chen, Le; Ahmed, Nesreen; Capotă, Mihai; Willke, Ted; Hasabnis, Niranjan; Jannesari, Ali (June 2025, IEEE)

Free, publicly-accessible full text available June 3, 2026
Coderosetta: Pushing the boundaries of unsupervised code translation for parallel programming

Tehrani, Ali; Bhattacharjee, Arijit; Chen, Le; Ahmed, Nesreen K; Yazdanbakhsh, Amir; Jannesari, Ali (December 2024, Advances in Neural Information Processing Systems)

Full Text Available
Static Generation of Efficient OpenMP Offload Data Mappings

Marzen, Luke; Dutta, Akash; Jannesari, Ali (November 2024, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC))

Full Text Available
MIREncoder: Multi-modal IR-based Pretrained Embeddings for Performance Optimizations

Dutta, Akash; Jannesari, Ali (October 2024, 33rd International Conference on Parallel Architectures and Compilation Techniques (PACT))

Full Text Available
PERFOGRAPH: a numerical aware program graph representation for performance optimization and program analysis

TehraniJamsaz, Ali; Mahmud, Quazi; Chen, Le; Ahmed, Nesreen; Jannesari, Ali (May 2024, Proceedings of the 37th International Conference on Neural Information Processing Systems)

Full Text Available
ParaGraph: Weighted Graph Representation for Performance Optimization of HPC Kernels

https://doi.org/10.1109/IPDPSW63119.2024.00070

TehraniJamsaz, Ali; Mishra, Alok; Dutta, Akash; Malik, Abid M; Chapman, Barbara; Jannesari, Ali (May 2024, IEEE)

Full Text Available
GraphBinMatch: Graph-Based Similarity Learning for Cross-Language Binary and Source Code Matching

https://doi.org/10.1109/IPDPSW63119.2024.00103

TehraniJamsaz, Ali; Chen, Hanze; Jannesari, Ali (May 2024, IEEE)

Full Text Available
Cross-Feature Transfer Learning for Efficient Tensor Program Generation

https://doi.org/10.3390/app14020513

Verma, Gaurav; Raskar, Siddhisanket; Emani, Murali; Chapman, Barbara (January 2024, Applied Sciences)

Tuning tensor program generation involves navigating a vast search space to find optimal program transformations and measurements for a program on the target hardware. The complexity of this process is further amplified by the exponential combinations of transformations, especially in heterogeneous environments. This research addresses these challenges by introducing a novel approach that learns the joint neural network and hardware features space, facilitating knowledge transfer to new, unseen target hardware. A comprehensive analysis is conducted on the existing state-of-the-art dataset, TenSet, including a thorough examination of test split strategies and the proposal of methodologies for dataset pruning. Leveraging an attention-inspired technique, we tailor the tuning of tensor programs to embed both neural network and hardware-specific features. Notably, our approach substantially reduces the dataset size by up to 53% compared to the baseline without compromising Pairwise Comparison Accuracy (PCA). Furthermore, our proposed methodology demonstrates competitive or improved mean inference times with only 25–40% of the baseline tuning time across various networks and target hardware. The attention-based tuner can effectively utilize schedules learned from previous hardware program measurements to optimize tensor program tuning on previously unseen hardware, achieving a top-5 accuracy exceeding 90%. This research introduces a significant advancement in autotuning tensor program generation, addressing the complexities associated with heterogeneous environments and showcasing promising results regarding efficiency and accuracy.
more » « less
Full Text Available
OMPGPT: A Generative Pre-trained Transformer Model for OpenMP

https://doi.org/10.1007/978-3-031-69577-3_9

Chen, Le; Bhattacharjee, Arijit; Ahmed, Nesreen; Hasabnis, Niranjan; Oren, Gal; Vo, Vy; Jannesari, Ali (January 2024, Springer Nature Switzerland)

Full Text Available

« Prev Next »

Search for: All records