NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

PIPA: Preference Alignment as Prior-Informed Statistical Estimation

Li, Junbo; Wang, Zhangyang; Liu, Qiang (July 2025, International Conference on Machine Learning (ICML))

Free, publicly-accessible full text available July 13, 2026
HALoS: Hierarchical Asynchronous Local SGD over Slow Networks for Geo-Distributed Large Language Model Training

Kim, Geon-Woo; Li, Junbo; Gandham, Shashidhar; Baldonado, Omar; Gangidi, Adithya; Balaji; Pavan; Wang, Zhangyang; Akella, Aditya (July 2025, International Conference on Machine Learning (ICML))

Free, publicly-accessible full text available July 13, 2026
From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories, and Applications

Jaiswal, Ajay; Wang, Yifan; Yin, Lu; Liu, Shiwei; Chen; Runjin; Zhao, Jiawei; Grama, Ananth; Tian, Yuandong; Wang, Zhangyang (July 2025, International Conference on Machine Learning (ICML))

Free, publicly-accessible full text available July 13, 2026
Transformers Provably Learn Two-Mixture of Linear Classification via Gradient Flow

Yang, Hongru; Wang, Zhangyang; Lee, Jason D; Liang, Yingbin (April 2025, Neural Information Processing Systems (NeurIPS))

Free, publicly-accessible full text available April 30, 2026
Transformers provably learn two-mixture of linear classification via gradient flow

Yang, Hongru; Wang, Zhangyang; Lee, Jason D; Liang, Yingbin (April 2025, International Conference on Learning Representations (ICLR))

Free, publicly-accessible full text available April 24, 2026
Efficient Neuro-Symbolic Policy using In-Memory Computing

Molom-Ochir, Tergel; Saxena, Naman; Kim, Jiwoo; Chen, Yiran; Wang, Zhangyang; Pajic, Miroslav; Li, Hai (May 2025, International Conference on Neuro-symbolic Systems (NeuS))

Free, publicly-accessible full text available May 28, 2026
HALoS: Hierarchical Asynchronous Local SGD over Slow Networks for Geo-Distributed Large Language Model Training

Kim, Geon-Woo; Li, Junbo; Gandham, Shashidhar; Baldonado, Omar; Gangidi, Adithya; Balaji, Pavan; Wang, Zhangyang; Akella, Aditya (June 2025, https://doi.org/10.48550/arXiv.2506.04531 Focus to learn more)

Training large language models (LLMs) increasingly relies on geographically distributed accelerators, causing prohibitive communication costs across regions and uneven utilization of heterogeneous hardware. We propose HALoS, a hierarchical asynchronous optimization framework that tackles these issues by introducing local parameter servers (LPSs) within each region and a global parameter server (GPS) that merges updates across regions. This hierarchical design minimizes expensive inter-region communication, reduces straggler effects, and leverages fast intra-region links. We provide a rigorous convergence analysis for HALoS under non-convex objectives, including theoretical guarantees on the role of hierarchical momentum in asynchronous training. Empirically, HALoS attains up to 7.5x faster convergence than synchronous baselines in geo-distributed LLM training and improves upon existing asynchronous methods by up to 2.1x. Crucially, HALoS preserves the model quality of fully synchronous SGD-matching or exceeding accuracy on standard language modeling and downstream benchmarks-while substantially lowering total training time. These results demonstrate that hierarchical, server-side update accumulation and global model merging are powerful tools for scalable, efficient training of new-era LLMs in heterogeneous, geo-distributed environments.
more » « less
Free, publicly-accessible full text available June 5, 2026
Random Pruning Over-parameterized Neural Networks Can Improve Generalization: A Training Dynamics Analysis

Yang, Hongru; Liang, Yingbin; Guo, Xiaojie; Wu, Lingfei; Wang, Zhangyang (April 2025, Journal of machine learning research)

Free, publicly-accessible full text available April 15, 2026
Random Pruning Over-parameterized Neural Networks Can Improve Generalization: A Training Dynamics Analysis

Yang, Hongru; Liang, Yingbin; Guo, Xiaojie; Wu, Lingfei; Wang, Zhangyang (April 2025, Journal of Machine Learning Research (JMLR))

Free, publicly-accessible full text available April 1, 2026
Meta ControlNet: Enhancing task adaptation via meta learning

Yang, Junjie; Zhao, Jinze; Wang, Peihao; Wang, Zhangyang; Liang, Yingbin (March 2025, Conference on Parsimony and Learning (CPAL))

Free, publicly-accessible full text available March 24, 2026

« Prev Next »

Search for: All records