Search for: All records

Creators/Authors contains: "YU, Xiaodong"

« Prev Next »

Total Resources

17

Resource Type
Conference Paper

17

Conference Proceeding

0

Dataset

0

Journal Article

0

Workshop Report

0

Availability
Full Text / Resource Available

11

Citation Only

6

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

G2-AIMD: A Memory-Efficient Subgraph-Centric Framework for Efficient Subgraph Search on GPUs

Yuan, Lyuheng ; Ahmad, Akhlaque ; Yan, Da ; Han, Jiao ; Adhikari, Saugat ; Yu, Xiaodong ; Zhou, Yang ( April 2024 , 40th IEEE International Conference on Data Engineering)

Finding from a big graph those subgraphs that satisfy certain conditions (aka. subgraph search) is useful in many applications such as community detection and subgraph matching. These problems often generate a search-space tree with size exponential to the size of the input graph. GPUs with thousands of cores are a natural choice to speed up subgraph search, but existing GPU solutions either conduct BFS on the search-space tree which leads to memory overflow due to intermediate subgraph-size explosion, or they conduct DFS on the search-space tree which is memory-efficient but can be 2 orders of magnitude slower than a BFS solution. In this paper, we present G2-AIMD, a subgraph-centric framework for efficient subGraph Search on GPUs, which enjoys the efficiency of BFS on the search-space tree, while avoids intermediate subgraph-size explosion with novel system designs such as adaptive chunk-size adjustment and host-memory subgraph buffering, inspired by the additive-increase/multiplicative-decrease (AIMD) algorithm in TCP congestion control. G2-AIMD provides a convenient subgraph-centric programming interface to facilitate the implementation of subgraph search algorithms on top, so as to enjoy the above performance merits. G2-AIMD also supports multi-GPU execution where each GPU only needs to load a fraction of the input graph. To demonstrate the efficiency and scalability of G2-AIMD, two algorithms were implemented on top with additional optimization techniques, and they significantly outperform the existing GPU solutions.
more » « less
Lightweight Huffman Coding for Efficient GPU Compression

https://doi.org/10.1145/3577193.3593736

Shah, Milan ; Yu, Xiaodong ; Di, Sheng ; Becchi, Michela ; Cappello, Franck ( June 2023 , ICS '23: Proceedings of the 37th International Conference on Supercomputing)

Free, publicly-accessible full text available June 21, 2024
GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs

https://doi.org/10.1145/3577193.3593706

Zhang, Boyuan ; Tian, Jiannan ; Di, Sheng ; Yu, Xiaodong ; Swany, Martin ; Tao, Dingwen ; Cappello, Franck ( June 2023 , ICS '23: Proceedings of the 37th International Conference on Supercomputing)

Free, publicly-accessible full text available June 21, 2024
GPULZ: Optimizing LZSS Lossless Compression for Multi-byte Data on Modern GPUs

Zhang, Boyuan ; Tian, Jiannan ; Di, Sheng ; Yu, Xiaodong ; Swany, Martin ; Tao, Dingwen ; Cappello, Franck ( June 2023 , The 37th ACM International Conference on Supercomputing (ICS 2023))

Free, publicly-accessible full text available June 21, 2024
FZ-GPU: A Fast and High-Ratio Lossy Compressor for Scientific Computing Applications on GPUs

https://doi.org/10.1145/3588195.3592994

Zhang, Boyuan ; Tian, Jiannan ; Di, Sheng ; Yu, Xiaodong ; Feng, Yunhe ; Liang, Xin ; Tao, Dingwen ; Cappello, Franck ( June 2023 , The 32nd ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2023))

Today’s large-scale scientific applications running on high-performance computing (HPC) systems generate vast data volumes. Thus, data compression is becoming a critical technique to mitigate the storage burden and data-movement cost. However, existing lossy compressors for scientific data cannot achieve a high compression ratio and throughput simultaneously, hindering their adoption in many applications requiring fast compression, such as in-memory compression. To this end, in this work, we develop a fast and high-ratio error-bounded lossy compressor on GPUs for scientific data (called FZ-GPU). Specifically, we first design a new compression pipeline that consists of fully parallelized quantization, bitshuffle, and our newly designed fast encoding. Then, we propose a series of deep architectural optimizations for each kernel in the pipeline to take full advantage of CUDA architectures. We propose a warp-level optimization to avoid data conflicts for bit-wise operations in bitshuffle, maximize shared memory utilization, and eliminate unnecessary data movements by fusing different compression kernels. Finally, we evaluate FZ-GPU on two NVIDIA GPUs (i.e., A100 and RTX A4000) using six representative scientific datasets from SDRBench. Results on the A100 GPU show that FZ-GPU achieves an average speedup of 4.2× over cuSZ and an average speedup of 37.0× over a multi-threaded CPU implementation of our algorithm under the same error bound. FZ-GPU also achieves an average speedup of 2.3× and an average compression ratio improvement of 2.0× over cuZFP under the same data distortion.
more » « less
Free, publicly-accessible full text available June 16, 2024
HEAT: A Highly Efficient and Affordable Training System for Collaborative Filtering Based Recommendation on CPUs

https://doi.org/10.1145/3577193.3593717

Zhang, Chengming ; Smith, Shaden ; Sun, Baixi ; Tian, Jiannan ; Soifer, Jonathan ; Yu, Xiaodong ; Song, Shuaiwen Leon ; He, Yuxiong ; Tao, Dingwen ( June 2023 , ACM)
GPU-Accelerated Error-Bounded Compression Framework for Quantum Circuit Simulations

https://doi.org/10.1109/IPDPS54959.2023.00081

Shah, Milan ; Yu, Xiaodong ; Di, Sheng ; Lykov, Danylo ; Alexeev, Yuri ; Becchi, Michela ; Cappello, Franck ( May 2023 , 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS))

Quantum circuit simulations enable researchers to develop quantum algorithms without the need for a physical quantum computer. Quantum computing simulators, however, all suffer from significant memory footprint requirements, which prevents large circuits from being simulated on classical super-computers. In this paper, we explore different lossy compression strategies to substantially shrink quantum circuit tensors in the QTensor package (a state-of-the-art tensor network quantum circuit simulator) while ensuring the reconstructed data satisfy the user-needed fidelity.Our contribution is fourfold. (1) We propose a series of optimized pre- and post-processing steps to boost the compression ratio of tensors with a very limited performance overhead. (2) We characterize the impact of lossy decompressed data on quantum circuit simulation results, and leverage the analysis to ensure the fidelity of reconstructed data. (3) We propose a configurable compression framework for GPU based on cuSZ and cuSZx, two state-of-the-art GPU-accelerated lossy compressors, to address different use-cases: either prioritizing compression ratios or prioritizing compression speed. (4) We perform a comprehensive evaluation by running 9 state-of-the-art compressors on an NVIDIA A100 GPU based on QTensor-generated tensors of varying sizes. When prioritizing compression ratio, our results show that our strategies can increase the compression ratio nearly 10 times compared to using only cuSZ. When prioritizing throughput, we can perform compression at the comparable speed as cuSZx while achieving 3-4× higher compression ratios. Decompressed tensors can be used in QTensor circuit simulation to yield a final energy result within 1-5% of the true energy value.
more » « less
Free, publicly-accessible full text available May 1, 2024
Ultrafast Error-Bounded Lossy Compression for Scientific Datasets

https://doi.org/10.1145/3502181.3531473

Yu, Xiaodong ; Di, Sheng ; Zhao, Kai ; Tian, Jiannan ; Tao, Dingwen ; Liang, Xin ; Cappello, Franck ( June 2022 , The 31st ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC 2022))

Today’s scientific high-performance computing applications and advanced instruments are producing vast volumes of data across a wide range of domains, which impose a serious burden on data transfer and storage. Error-bounded lossy compression has been developed and widely used in the scientific community because it not only can significantly reduce the data volumes but also can strictly control the data distortion based on the user-specified error bound. Existing lossy compressors, however, cannot offer ultrafast compression speed, which is highly demanded by numerous applications or use cases (such as in-memory compression and online instrument data compression). In this paper we propose a novel ultrafast error-bounded lossy compressor that can obtain fairly high compression performance on both CPUs and GPUs and with reasonably high compression ratios. The key contributions are threefold. (1) We propose a generic error-bounded lossy compression framework—called SZx—that achieves ultrafast performance through its novel design comprising only lightweight operations such as bitwise and addition/subtraction operations, while still keeping a high compression ratio. (2) We implement SZx on both CPUs and GPUs and optimize the performance according to their architectures. (3) We perform a comprehensive evaluation with six real-world production-level scientific datasets on both CPUs and GPUs. Experiments show that SZx is 2∼16× faster than the second-fastest existing error-bounded lossy compressor (either SZ or ZFP) on CPUs and GPUs, with respect to both compression and decompression.
more » « less
Full Text Available
Optimizing Huffman Decoding for Error-Bounded Lossy Compression on GPUs

https://doi.org/10.1109/IPDPS53621.2022.00075

Rivera, Cody ; Di, Sheng ; Tian, Jiannan ; Yu, Xiaodong ; Tao, Dingwen ; Cappello, Franck ( May 2022 , The 36th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2022))

More and more HPC applications require fast and effective compression techniques to handle large volumes of data in storage and transmission. Not only do these applications need to compress the data effectively during simulation, but they also need to perform decompression efficiently for post hoc analysis. SZ is an error-bounded lossy compressor for scientific data, and cuSZ is a version of SZ designed to take advantage of the GPU's power. At present, cuSZ's compression performance has been optimized significantly while its decompression still suffers considerably lower performance because of its sophisticated lossless compression step---a customized Huffman decoding. In this work, we aim to significantly improve the Huffman decoding performance for cuSZ, thus improving the overall decompression performance in turn. To this end, we first investigate two state-of-the-art GPU Huffman decoders in depth. Then, we propose a deep architectural optimization for both algorithms. Specifically, we take full advantage of CUDA GPU architectures by using shared memory on decoding/writing phases, online tuning the amount of shared memory to use, improving memory access patterns, and reducing warp divergence. Finally, we evaluate our optimized decoders on an Nvidia V100 GPU using eight representative scientific datasets. Our new decoding solution obtains an average speedup of 3.64X over cuSZ's Huffman decoder and improves its overall decompression performance by 2.43X on average.
more » « less
Full Text Available
cuZ-Checker: A GPU-Based Ultra-Fast Assessment System for Lossy Compressions

https://doi.org/10.1109/Cluster48925.2021.00065

Yu, Xiaodong ; Di, Sheng ; Gok, Ali Murat ; Tao, Dingwen ; Cappello, Franck ( September 2021 , 2021 IEEE International Conference on Cluster Computing (CLUSTER 2021))
null (Ed.)
Full Text Available

« Prev Next »