NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ML4SODA: A Decision Tree Guided Design Space Exploration for Fast and High Quality MLIR-based HLS

https://doi.org/10.1145/3716368.3735223

Manjunath, Darshith; Agostini, Nicolas Bohm; Tumeo, Antonino; Zhang, Jeff; Chakrabarti, Chaitali (June 2025, ACM)

Free, publicly-accessible full text available June 29, 2026
FuseIM: Fusing probabilistic traversals for influence maximization on exascale systems

Neff, Reece; Zach, Mostafa; Minutoli, Marco; Halappanavar, Mahantesh; Tumeo, Antonino; Kalyanaraman, Ananth; Becchi, Michela (August 2024, ICS 2024)

Full Text Available
High-Level Synthesis of Irregular Applications: A Case Study on Influence Maximization

https://doi.org/10.1145/3587135.3592196

Neff, Reece; Minutoli, Marco; Tumeo, Antonino; Becchi, Michela (May 2023, CF '23: Proceedings of the 20th ACM International Conference on Computing Frontiers)

FPGAs are promising platforms for accelerating irregular applications due to their ability to implement highly specialized hardware designs for each kernel. However, the design and implementation of FPGA-accelerated kernels can take several months using hardware design languages. High Level Synthesis (HLS) tools provide fast, high quality results for regular applications, but lack the support to effectively accelerate more irregular, complex workloads. This work analyzes the challenges and benefits of using a commercial state-of-the-art HLS tool and its available optimizations to accelerate graph sampling. We evaluate the resulting designs and their effectiveness when deployed in a state-of-the-art heterogeneous framework that implements the Influence Maximization with Martingales (IMM) algorithm, a complex graph analytics algorithm. We discuss future opportunities for improvement in hardware, HLS tools, and hardware/software co-design methodology to better support complex irregular applications such as IMM.
more » « less
Full Text Available
Towards scaling community detection on distributed-memory heterogeneous systems

https://doi.org/10.1016/j.parco.2022.102898

Gawande, Nitin; Ghosh, Sayan; Halappanavar, Mahantesh; Tumeo, Antonino; Kalyanaraman, Ananth (July 2022, Parallel Computing)

Full Text Available
Accelerating Random Forest Classification on GPU and FPGA

https://doi.org/10.1145/3545008.3545067

Shah, Milan; Neff, Reece; Wu, Hancheng; Minutoli, Marco; Tumeo, Antonino; Becchi, Michela (August 2022, ICPP '22: Proceedings of the 51st International Conference on Parallel Processing)

Random Forests (RFs) are a commonly used machine learning method for classification and regression tasks spanning a variety of application domains, including bioinformatics, business analytics, and software optimization. While prior work has focused primarily on improving performance of the training of RFs, many applications, such as malware identification, cancer prediction, and banking fraud detection, require fast RF classification. In this work, we accelerate RF classification on GPU and FPGA. In order to provide efficient support for large datasets, we propose a hierarchical memory layout suitable to the GPU/FPGA memory hierarchy. We design three RF classification code variants based on that layout, and we investigate GPU- and FPGA-specific considerations for these kernels. Our experimental evaluation, performed on an Nvidia Xp GPU and on a Xilinx Alveo U250 FPGA accelerator card using publicly available datasets on the scale of millions of samples and tens of features, covers various aspects. First, we evaluate the performance benefits of our hierarchical data structure over the standard compressed sparse row (CSR) format. Second, we compare our GPU implementation with cuML, a machine learning library targeting Nvidia GPUs. Third, we explore the performance/accuracy tradeoff resulting from the use of different tree depths in the RF. Finally, we perform a comparative performance analysis of our GPU and FPGA implementations. Our evaluation shows that, while reporting the best performance on GPU, our code variants outperform the CSR baseline both on GPU and FPGA. For high accuracy targets, our GPU implementation yields a 5-9 × speedup over CSR, and up to a 2 × speedup over Nvidia’s cuML library.
more » « less
Full Text Available
HAM: Hotspot-Aware Manager for Improving Communications With 3D-Stacked Memory

https://doi.org/10.1109/TC.2021.3066982

Wang, Xi; Tumeo, Antonino; Leidel, John D.; Li, Jie; Chen, Yong (June 2021, IEEE Transactions on Computers)

Full Text Available
PREEMPT: Scalable Epidemic Interventions Using Submodular Optimization on Multi-GPU Systems

https://doi.org/10.1109/SC41405.2020.00059

Minutoli, Marco; Sambaturu, Prathyush; Halappanavar, Mahantesh; Tumeo, Antonino; Kalyananaraman, Ananth; Vullikanti, Anil (November 2020, IEEE/ACM International Conference for High Performance Computing, Networking, Storage, and Analysis (SC'20))
null (Ed.)
Full Text Available
cuRipples: influence maximization on multi-GPU systems

https://doi.org/10.1145/3392717.3392750

Minutoli, Marco; Drocco, Maurizio; Halappanavar, Mahantesh; Tumeo, Antonino; Kalyanaraman, Ananth (June 2020, ACM International Conference on Supercomputing (ICS'20), pp. 1-11, 2020)
null (Ed.)
Full Text Available
cuRipples: Influence Maximization on Multi-GPU Systems

Minutoli, Marco; Dracco, Maurizio; Halappanavar, Mahantesh; Tumeo, Antonino; Kalyanaraman, Ananth (June 2020, 2020 ACM International Conference on Supercomputing (ICS))

Full Text Available
Scaling and Quality of Modularity Optimization Methods for Graph Clustering

https://doi.org/10.1109/HPEC.2019.8916299

Ghosh, Sayan; Halappanavar, Mahantesh; Tumeo, Antonino; Kalyanarainan, Ananth (September 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC))

Full Text Available

« Prev Next »

Search for: All records