NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

HALoS: Hierarchical Asynchronous Local SGD over Slow Networks for Geo-Distributed Large Language Model Training

Kim, Geon-Woo; Li, Junbo; Gandham, Shashidhar; Baldonado, Omar; Gangidi, Adithya; Balaji; Pavan; Wang, Zhangyang; Akella, Aditya (July 2025, International Conference on Machine Learning (ICML))

Free, publicly-accessible full text available July 13, 2026
MTP: Transport for In-Network Computing

Ji, Tao; Vardekar, Rohan; Vamanan, Balajee; Stephens, Brent E; Akella, Aditya (April 2025, USENIX)

Free, publicly-accessible full text available April 28, 2026
MTP: Transport for In-Network Computing

Ji, Tao; Vardekar, Rohan; Vamanan, Balajee; Stephens, Brent E; Akella, Aditya (April 2025, USENIX)

Free, publicly-accessible full text available April 28, 2026
HALoS: Hierarchical Asynchronous Local SGD over Slow Networks for Geo-Distributed Large Language Model Training

Kim, Geon-Woo; Li, Junbo; Gandham, Shashidhar; Baldonado, Omar; Gangidi, Adithya; Balaji, Pavan; Wang, Zhangyang; Akella, Aditya (June 2025, https://doi.org/10.48550/arXiv.2506.04531 Focus to learn more)

Training large language models (LLMs) increasingly relies on geographically distributed accelerators, causing prohibitive communication costs across regions and uneven utilization of heterogeneous hardware. We propose HALoS, a hierarchical asynchronous optimization framework that tackles these issues by introducing local parameter servers (LPSs) within each region and a global parameter server (GPS) that merges updates across regions. This hierarchical design minimizes expensive inter-region communication, reduces straggler effects, and leverages fast intra-region links. We provide a rigorous convergence analysis for HALoS under non-convex objectives, including theoretical guarantees on the role of hierarchical momentum in asynchronous training. Empirically, HALoS attains up to 7.5x faster convergence than synchronous baselines in geo-distributed LLM training and improves upon existing asynchronous methods by up to 2.1x. Crucially, HALoS preserves the model quality of fully synchronous SGD-matching or exceeding accuracy on standard language modeling and downstream benchmarks-while substantially lowering total training time. These results demonstrate that hierarchical, server-side update accumulation and global model merging are powerful tools for scalable, efficient training of new-era LLMs in heterogeneous, geo-distributed environments.
more » « less
Free, publicly-accessible full text available June 5, 2026
CONGO: COMPRESSIVE ONLINE GRADIENT OPTIMIZATION

Carleton, Jeremy; Vijaykumar, Prathik; Saxena, Divyanshu; Narasimha, Dheeraj; Shakkottai, Srinivas; Akella, Aditya (April 2025, ICLR, International Conference on Learning Representations 2025)

Free, publicly-accessible full text available April 24, 2026
Copper and Wire: Bridging Expressiveness and Performance for Service Mesh Policies

https://doi.org/10.1145/3669940.3707257

Saxena, Divyanshu; Zhang, William; Pailoor, Shankara; Dillig, Isil; Akella, Aditya (March 2025, ACM)

Free, publicly-accessible full text available March 30, 2026
Enabling Portable and High-Performance SmartNIC Programs with Alkali

Lin, Jiaxin; Guo, Zhiyuan; Shah, Mihir; Ji, Tao; Zhang, Yiying; Kim, Daehyeok; Akella, Aditya (April 2025, USENIX NSDI)

Free, publicly-accessible full text available April 28, 2026
Enabling Portable and High-Performance SmartNIC Programs with Alkali

Lin, Jiaxin Lin; Guo, Zhiyuan; Shah, Mihir; Ji, Tao; Zhang, Yiying; Kim, Daehyeok Kim; Akella, Aditya (April 2025, USENIX)

Trends indicate that emerging SmartNICs, either from different vendors or generations from the same vendor, exhibit substantial differences in hardware parallelism and memory interconnects. These variations make porting programs across NICs highly complex and time-consuming, requiring programmers to significantly refactor code for performance based on each target NIC’s hardware characteristics. We argue that an ideal SmartNIC compilation framework should allow developers to write target-independent programs, with the compiler automatically managing cross-NIC porting and performance optimization. We present such a framework, Alkali, that achieves this by (1) proposing a new intermediate representation for building flexible compiler infrastructure for multiple NIC targets and (2) developing a new iterative parallelism optimization algorithm that automatically ports and parallelizes the input programs based on the target NIC’s hardware characteristics. Experiments across a wide range of NIC applications demonstrate that Alkali enables developers to easily write portable, high-performance NIC programs. Our compiler optimization passes can automatically port these programs and make them run efficiently across all targets, achieving performance within 9.8% of hand-tuned expert implementations.
more » « less
Free, publicly-accessible full text available April 28, 2026
How I learned to stop worrying and love learned OS policies

Saxena, Divyanshu; Chen, Jiayi; Yadalam, Sujay; Ro, Yeonju; Dwivedula, Rohit; Campbell, Eric; Akella, Aditya; Rossbach, Christopher J; Swift, Michael (May 2025, Workshop in Hot Topics in Operating Systems (HOTOS 25))

While machine learning has been adopted across various fields, its ability to outperform traditional heuristics in operating systems is often met with justified skepticism. Concerns about unsafe decisions, opaque debugging processes, and the challenges of integrating ML into the kernel—given its stringent latency constraints and inherent complexity — make practitioners understandably cautious. This paper introduces Guardrails for the OS, a framework that allows kernel developers to declaratively specify system-level properties and define corrective actions to address property violations. The framework facilitates the compilation of these guardrails into monitors capable of running within the kernel. In this work, we establish the foundation for Guardrails, detailing its core abstractions, examining the problem space, and exploring potential solutions.
more » « less
Free, publicly-accessible full text available May 14, 2026
How I learned to stop worrying and love learned OS policies

https://doi.org/10.1145/3713082.3730384

Saxena, Divyanshu; Chen, Jiayi; Yadalam, Sujay; Ro, Yeonju; Dwivedula, Rohit; Campbell, Eric H; Akella, Aditya; Rossbach, Christopher J; Swift, Michael (May 2025, ACM)

Free, publicly-accessible full text available May 14, 2026

« Prev Next »

Search for: All records