NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

EvaMAE: How Helpful Are DEM Data in Enhancing Geo-Foundation Models for Earth Imagery?

Adhikari, Saugat; Yan, Da; Nimbale, Naman; Liu, Weijin; Yu, Xiaodong; Ahmad, Akhlaque; Yuan, Lyuheng; Han, Jiao; Jiang, Zhe (September 2025, ACM)

Free, publicly-accessible full text available September 17, 2026
ALinFiK: Learning to Approximate Linearized Future Influence Kernel for Scalable Third-Parity LLM Data Valuation

Pan, Yanzhou; Lin, Huawei; Ran, Yide; Chen, Jiamin; Yu, Xiaodong; Zhao, Weijie; Zhang, Denghui; Xu, Zhaozhuo (April 2025, Association for Computational Linguistics)
Chiruzzo, Luis; Ritter, Alan; Wang, Lu (Ed.)
Large Language Models (LLMs) heavily rely on high-quality training data, making data valuation crucial for optimizing model performance, especially when working within a limited budget. In this work, we aim to offer a third-party data valuation approach that benefits both data providers and model developers. We introduce a linearized future influence kernel (LinFiK), which assesses the value of individual data samples in improving LLM performance during training. We further propose ALinFiK, a learning strategy to approximate LinFiK, enabling scalable data valuation. Our comprehensive evaluations demonstrate that this approach surpasses existing baselines in effectiveness and efficiency, demonstrating significant scalability advantages as LLM parameters increase.
more » « less
Free, publicly-accessible full text available April 29, 2026
GFormer: Accelerating Large Language Models with Optimized Transformers on Gaudi Processors

Zhang, Chengming; Ding, Xinheng; Sun, Baixi; Yu, Xiaodong; Zheng, Weijian; Xie, Zhen; Tao, Dingwen (December 2024, arXivorg)

Free, publicly-accessible full text available December 19, 2025
ZCCL: Significantly Improving Collective Communication With Error-Bounded Lossy Compression

Huang, Jiajun; Di, Sheng; Yu, Xiaodong; Zhai, Yujia; Zhang, Zhaorui; Liu, Jinyang; Lu, Xiaoyi; Raffenetti, Ken; Zhou, Hui; Zhao, Kai; et al (February 2025, arXivorg)

Free, publicly-accessible full text available February 25, 2026
COMPSO: Optimizing Gradient Compression for Distributed Training with Second-Order Optimizers

https://doi.org/10.1145/3710848.3710852

Sun, Baixi; Liu, Weijin; Pauloski, J Gregory; Tian, Jiannan; Jia, Jinda; Wang, Daoce; Zhang, Boyuan; Zheng, Mingkai; Di, Sheng; Jin, Sian; et al (February 2025, ACM)

Free, publicly-accessible full text available February 28, 2026
A Survey on Error-Bounded Lossy Compression for Scientific Datasets

https://doi.org/10.1145/3733104

Di, Sheng; Liu, Jinyang; Zhao, Kai; Liang, Xin; Underwood, Robert; Zhang, Zhaorui; Shah, Milan; Huang, Yafan; Huang, Jiajun; Yu, Xiaodong; et al (May 2025, ACM Computing Surveys)

Error-bounded lossy compression has been effective in significantly reducing the data storage/transfer burden while preserving the reconstructed data fidelity very well. Many error-bounded lossy compressors have been developed for a wide range of parallel and distributed use cases for years. They are designed with distinct compression models and principles, such that each of them features particular pros and cons. In this paper we provide a comprehensive survey of emerging error-bounded lossy compression techniques. The key contribution is fourfold. (1) We summarize a novel taxonomy of lossy compression into 6 classic models. (2) We provide a comprehensive survey of 10 commonly used compression components/modules. (3) We summarized pros and cons of 47 state-of-the-art lossy compressors and present how state-of-the-art compressors are designed based on different compression techniques. (4) We discuss how customized compressors are designed for specific scientific applications and use-cases. We believe this survey is useful to multiple communities including scientific applications, high-performance computing, lossy compression, and big data.
more » « less
Free, publicly-accessible full text available May 2, 2026
hZCCL: Accelerating Collective Communication with Co-Designed Homomorphic Compression

https://doi.org/10.1109/SC41406.2024.00110

Huang, Jiajun; Di, Sheng; Yu, Xiaodong; Zhai, Yujia; Liu, Jinyang; Jian, Zizhe; Liang, Xin; Zhao, Kai; Lu, Xiaoyi; Chen, Zizhong; et al (November 2024, IEEE)

Free, publicly-accessible full text available November 17, 2025
A Portable, Fast, DCT-based Compressor for AI Accelerators

https://doi.org/10.1145/3625549.3658662

Shah, Milan; Yu, Xiaodong; Di, Sheng; Becchi, Michela; Cappello, Franck (June 2024, ACM)

Full Text Available
Centimani: Enabling Fast AI Accelerator Selection for DNN Training with a Novel Performance Predictor

Xie, Zhen; Emani, Murali; Yu, Xiaodong; Tao, Dingwen; He, Xin; Su, Pengfei; Zhou, Keren; Vishwanath, Venkatram (July 2024, 2024 USENIX Annual Technical Conference (USENIX ATC 24))

Full Text Available
G2-AIMD: A Memory-Efficient Subgraph-Centric Framework for Efficient Subgraph Search on GPUs

Yuan, Lyuheng; Ahmad, Akhlaque; Yan, Da; Han, Jiao; Adhikari, Saugat; Yu, Xiaodong; Zhou, Yang (April 2024, 40th IEEE International Conference on Data Engineering)

Finding from a big graph those subgraphs that satisfy certain conditions (aka. subgraph search) is useful in many applications such as community detection and subgraph matching. These problems often generate a search-space tree with size exponential to the size of the input graph. GPUs with thousands of cores are a natural choice to speed up subgraph search, but existing GPU solutions either conduct BFS on the search-space tree which leads to memory overflow due to intermediate subgraph-size explosion, or they conduct DFS on the search-space tree which is memory-efficient but can be 2 orders of magnitude slower than a BFS solution. In this paper, we present G2-AIMD, a subgraph-centric framework for efficient subGraph Search on GPUs, which enjoys the efficiency of BFS on the search-space tree, while avoids intermediate subgraph-size explosion with novel system designs such as adaptive chunk-size adjustment and host-memory subgraph buffering, inspired by the additive-increase/multiplicative-decrease (AIMD) algorithm in TCP congestion control. G2-AIMD provides a convenient subgraph-centric programming interface to facilitate the implementation of subgraph search algorithms on top, so as to enjoy the above performance merits. G2-AIMD also supports multi-GPU execution where each GPU only needs to load a fraction of the input graph. To demonstrate the efficiency and scalability of G2-AIMD, two algorithms were implemented on top with additional optimization techniques, and they significantly outperform the existing GPU solutions.
more » « less

« Prev Next »

Search for: All records