- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources3
- Resource Type
-
0002000001000000
- More
- Availability
-
12
- Author / Contributor
- Filter by Author / Creator
-
-
Balaji, Pavan (3)
-
Akella, Aditya (1)
-
Baldonado, Omar (1)
-
Berzins, Martin (1)
-
Chandramowlishwaran, Aparna (1)
-
Chu, Ching-Hsiang (1)
-
Deng, Summer (1)
-
Feng, Hao (1)
-
Gandham, Shashidhar (1)
-
Gangidi, Adithya (1)
-
Geng, Tong (1)
-
Hao, Yuchen (1)
-
Kim, Geon-Woo (1)
-
Li, Junbo (1)
-
Sahasrabudhe, Damodar (1)
-
Si, Min (1)
-
Tao, Dingwen (1)
-
Tian, Jiannan (1)
-
Wang, Zhangyang (1)
-
Ye, Fanjiang (1)
-
- Filter by Editor
-
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
(submitted - in Review for IEEE ICASSP-2024) (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Training large language models (LLMs) increasingly relies on geographically distributed accelerators, causing prohibitive communication costs across regions and uneven utilization of heterogeneous hardware. We propose HALoS, a hierarchical asynchronous optimization framework that tackles these issues by introducing local parameter servers (LPSs) within each region and a global parameter server (GPS) that merges updates across regions. This hierarchical design minimizes expensive inter-region communication, reduces straggler effects, and leverages fast intra-region links. We provide a rigorous convergence analysis for HALoS under non-convex objectives, including theoretical guarantees on the role of hierarchical momentum in asynchronous training. Empirically, HALoS attains up to 7.5x faster convergence than synchronous baselines in geo-distributed LLM training and improves upon existing asynchronous methods by up to 2.1x. Crucially, HALoS preserves the model quality of fully synchronous SGD-matching or exceeding accuracy on standard language modeling and downstream benchmarks-while substantially lowering total training time. These results demonstrate that hierarchical, server-side update accumulation and global model merging are powerful tools for scalable, efficient training of new-era LLMs in heterogeneous, geo-distributed environments.more » « lessFree, publicly-accessible full text available June 5, 2026
-
Feng, Hao; Zhang, Boyuan; Ye, Fanjiang; Si, Min; Chu, Ching-Hsiang; Tian, Jiannan; Yin, Chunxing; Deng, Summer; Hao, Yuchen; Balaji, Pavan; et al (, IEEE)Free, publicly-accessible full text available November 17, 2025
-
Zambre, Rohit; Sahasrabudhe, Damodar; Zhou, Hui; Berzins, Martin; Chandramowlishwaran, Aparna; Balaji, Pavan (, IEEE Transactions on Parallel and Distributed Systems)
An official website of the United States government
