Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
                                            Some full text articles may not yet be available without a charge during the embargo (administrative interval).
                                        
                                        
                                        
                                            
                                                
                                             What is a DOI Number?
                                        
                                    
                                
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
- 
            Free, publicly-accessible full text available July 13, 2026
- 
            Free, publicly-accessible full text available July 13, 2026
- 
            Free, publicly-accessible full text available July 13, 2026
- 
            Free, publicly-accessible full text available April 30, 2026
- 
            Free, publicly-accessible full text available April 24, 2026
- 
            Free, publicly-accessible full text available May 28, 2026
- 
            Training large language models (LLMs) increasingly relies on geographically distributed accelerators, causing prohibitive communication costs across regions and uneven utilization of heterogeneous hardware. We propose HALoS, a hierarchical asynchronous optimization framework that tackles these issues by introducing local parameter servers (LPSs) within each region and a global parameter server (GPS) that merges updates across regions. This hierarchical design minimizes expensive inter-region communication, reduces straggler effects, and leverages fast intra-region links. We provide a rigorous convergence analysis for HALoS under non-convex objectives, including theoretical guarantees on the role of hierarchical momentum in asynchronous training. Empirically, HALoS attains up to 7.5x faster convergence than synchronous baselines in geo-distributed LLM training and improves upon existing asynchronous methods by up to 2.1x. Crucially, HALoS preserves the model quality of fully synchronous SGD-matching or exceeding accuracy on standard language modeling and downstream benchmarks-while substantially lowering total training time. These results demonstrate that hierarchical, server-side update accumulation and global model merging are powerful tools for scalable, efficient training of new-era LLMs in heterogeneous, geo-distributed environments.more » « lessFree, publicly-accessible full text available June 5, 2026
- 
            Free, publicly-accessible full text available April 15, 2026
- 
            Free, publicly-accessible full text available April 1, 2026
- 
            Free, publicly-accessible full text available March 24, 2026
 An official website of the United States government
An official website of the United States government 
				
			 
					 
					
