NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Portable, Fast, DCT-based Compressor for AI Accelerators

https://doi.org/10.1145/3625549.3658662

Shah, Milan; Yu, Xiaodong; Di, Sheng; Becchi, Michela; Cappello, Franck (June 2024, ACM)

Full Text Available
A Survey on Error-Bounded Lossy Compression for Scientific Datasets

https://doi.org/10.1145/3733104

Di, Sheng; Liu, Jinyang; Zhao, Kai; Liang, Xin; Underwood, Robert; Zhang, Zhaorui; Shah, Milan; Huang, Yafan; Huang, Jiajun; Yu, Xiaodong; et al (May 2025, ACM Computing Surveys)

Error-bounded lossy compression has been effective in significantly reducing the data storage/transfer burden while preserving the reconstructed data fidelity very well. Many error-bounded lossy compressors have been developed for a wide range of parallel and distributed use cases for years. They are designed with distinct compression models and principles, such that each of them features particular pros and cons. In this paper we provide a comprehensive survey of emerging error-bounded lossy compression techniques. The key contribution is fourfold. (1) We summarize a novel taxonomy of lossy compression into 6 classic models. (2) We provide a comprehensive survey of 10 commonly used compression components/modules. (3) We summarized pros and cons of 47 state-of-the-art lossy compressors and present how state-of-the-art compressors are designed based on different compression techniques. (4) We discuss how customized compressors are designed for specific scientific applications and use-cases. We believe this survey is useful to multiple communities including scientific applications, high-performance computing, lossy compression, and big data.
more » « less
Free, publicly-accessible full text available May 2, 2026
Lightweight Huffman Coding for Efficient GPU Compression

https://doi.org/10.1145/3577193.3593736

Shah, Milan; Yu, Xiaodong; Di, Sheng; Becchi, Michela; Cappello, Franck (June 2023, ICS '23: Proceedings of the 37th International Conference on Supercomputing)

Full Text Available
GPU-Accelerated Error-Bounded Compression Framework for Quantum Circuit Simulations

https://doi.org/10.1109/IPDPS54959.2023.00081

Shah, Milan; Yu, Xiaodong; Di, Sheng; Lykov, Danylo; Alexeev, Yuri; Becchi, Michela; Cappello, Franck (May 2023, 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS))

Quantum circuit simulations enable researchers to develop quantum algorithms without the need for a physical quantum computer. Quantum computing simulators, however, all suffer from significant memory footprint requirements, which prevents large circuits from being simulated on classical super-computers. In this paper, we explore different lossy compression strategies to substantially shrink quantum circuit tensors in the QTensor package (a state-of-the-art tensor network quantum circuit simulator) while ensuring the reconstructed data satisfy the user-needed fidelity.Our contribution is fourfold. (1) We propose a series of optimized pre- and post-processing steps to boost the compression ratio of tensors with a very limited performance overhead. (2) We characterize the impact of lossy decompressed data on quantum circuit simulation results, and leverage the analysis to ensure the fidelity of reconstructed data. (3) We propose a configurable compression framework for GPU based on cuSZ and cuSZx, two state-of-the-art GPU-accelerated lossy compressors, to address different use-cases: either prioritizing compression ratios or prioritizing compression speed. (4) We perform a comprehensive evaluation by running 9 state-of-the-art compressors on an NVIDIA A100 GPU based on QTensor-generated tensors of varying sizes. When prioritizing compression ratio, our results show that our strategies can increase the compression ratio nearly 10 times compared to using only cuSZ. When prioritizing throughput, we can perform compression at the comparable speed as cuSZx while achieving 3-4× higher compression ratios. Decompressed tensors can be used in QTensor circuit simulation to yield a final energy result within 1-5% of the true energy value.
more » « less
Full Text Available
The rhizodynamics robot: Automated imaging system for studying long-term dynamic root growth

https://doi.org/10.1371/journal.pone.0295823

Rajanala, Aradhya; Taylor, Isaiah W; McCaskey, Erin; Pierce, Christopher; Ligon, Jason; Aydin, Enes; Hunner, Carrie; Carmichael, Amanda; Eserman, Lauren; Coffee, Emily_E D; et al (December 2023, PLOS ONE)
Sinibaldi, Edoardo (Ed.)
The study of plant root growth in real time has been difficult to achieve in an automated, high-throughput, and systematic fashion. Dynamic imaging of plant roots is important in order to discover novel root growth behaviors and to deepen our understanding of how roots interact with their environments. We designed and implemented the Generating Rhizodynamic Observations Over Time (GROOT) robot, an automated, high-throughput imaging system that enables time-lapse imaging of 90 containers of plants and their roots growing in a clear gel medium over the duration of weeks to months. The system uses low-cost, widely available materials. As a proof of concept, we employed GROOT to collect images of root growth ofOryza sativa,Hudsonia montana, and multiple species of orchids includingPlatanthera integrilabiaover six months. Beyond imaging plant roots, our system is highly customizable and can be used to collect time- lapse image data of different container sizes and configurations regardless of what is being imaged, making it applicable to many fields that require longitudinal time-lapse recording.
more » « less
Full Text Available
Accelerating Random Forest Classification on GPU and FPGA

https://doi.org/10.1145/3545008.3545067

Shah, Milan; Neff, Reece; Wu, Hancheng; Minutoli, Marco; Tumeo, Antonino; Becchi, Michela (August 2022, ICPP '22: Proceedings of the 51st International Conference on Parallel Processing)

Random Forests (RFs) are a commonly used machine learning method for classification and regression tasks spanning a variety of application domains, including bioinformatics, business analytics, and software optimization. While prior work has focused primarily on improving performance of the training of RFs, many applications, such as malware identification, cancer prediction, and banking fraud detection, require fast RF classification. In this work, we accelerate RF classification on GPU and FPGA. In order to provide efficient support for large datasets, we propose a hierarchical memory layout suitable to the GPU/FPGA memory hierarchy. We design three RF classification code variants based on that layout, and we investigate GPU- and FPGA-specific considerations for these kernels. Our experimental evaluation, performed on an Nvidia Xp GPU and on a Xilinx Alveo U250 FPGA accelerator card using publicly available datasets on the scale of millions of samples and tens of features, covers various aspects. First, we evaluate the performance benefits of our hierarchical data structure over the standard compressed sparse row (CSR) format. Second, we compare our GPU implementation with cuML, a machine learning library targeting Nvidia GPUs. Third, we explore the performance/accuracy tradeoff resulting from the use of different tree depths in the RF. Finally, we perform a comparative performance analysis of our GPU and FPGA implementations. Our evaluation shows that, while reporting the best performance on GPU, our code variants outperform the CSR baseline both on GPU and FPGA. For high accuracy targets, our GPU implementation yields a 5-9 × speedup over CSR, and up to a 2 × speedup over Nvidia’s cuML library.
more » « less
Full Text Available
Network-based multi-task learning models for biomarker selection and cancer outcome prediction

https://doi.org/10.1093/bioinformatics/btz809

Wang, Zhibo; He, Zhezhi; Shah, Milan; Zhang, Teng; Fan, Deliang; Zhang, Wei; Cowen, Lenore (November 2019, Bioinformatics)

Abstract Motivation Detecting cancer gene expression and transcriptome changes with mRNA-sequencing (RNA-Seq) or array-based data are important for understanding the molecular mechanisms underlying carcinogenesis and cellular events during cancer progression. In previous studies, the differentially expressed genes were detected across patients in one cancer type. These studies ignored the role of mRNA expression changes in driving tumorigenic mechanisms that are either universal or specific in different tumor types. To address the problem, we introduce two network-based multi-task learning frameworks, NetML and NetSML, to discover common differentially expressed genes shared across different cancer types as well as differentially expressed genes specific to each cancer type. The proposed frameworks consider the common latent gene co-expression modules and gene-sample biclusters underlying the multiple cancer datasets to learn the knowledge crossing different tumor types. Results Large-scale experiments on simulations and real cancer high-throughput datasets validate that the proposed network-based multi-task learning frameworks perform better sample classification compared with the models without the knowledge sharing across different cancer types. The common and cancer specific molecular signatures detected by multi-task learning frameworks on TCGA ovarian cancer, breast cancer, and prostate cancer datasets are correlated with the known marker genes and enriched in cancer relevant KEGG pathways and Gene Ontology terms. Availability and Implementation Source code is available at: https://github.com/compbiolabucf/NetML Supplementary information Supplementary data are available at Bioinformatics
more » « less
Full Text Available

Search for: All records