NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

ZipLLM: Efficient LLM Storage via Model-Aware Synergistic Data Deduplication and Compression

Wang, Zirui; Lan, Tingfeng; Su, Zhaoyuan; Yang, Juncheng; Cheng, Yue (May 2026, 23rd USENIX USENIX Symposium on Networked Systems Design and Implementation (NSDI '26))

Modern model hubs, such as Hugging Face, store tens of petabytes of LLMs, with fine-tuned variants vastly outnumbering base models and dominating storage consumption. Existing storage reduction techniques---such as deduplication and compression---are either LLM-oblivious or not compatible with each other, limiting data reduction effectiveness. Our large-scale characterization study across all publicly available Hugging Face LLM repositories reveals several key insights: (1) fine-tuned models within the same family exhibit highly structured, sparse parameter differences suitable for delta compression; (2) bitwise similarity enables LLM family clustering; and (3) tensor-level deduplication is better aligned with model storage workloads, achieving high data reduction with low metadata overhead. Building on these insights, we design BitX, an effective, fast, lossless delta compression algorithm that compresses XORed difference between fine-tuned and base LLMs. We build ZipLLM, a model storage reduction pipeline that unifies tensor-level deduplication and lossless BitX compression. By synergizing deduplication and compression around LLM family clustering, ZipLLM reduces model storage consumption by 54%, over 20% higher than state-of-the-art deduplication and compression approaches.
more » « less
Free, publicly-accessible full text available May 4, 2027
Helix: Serving Large Language Models over Heterogeneous GPUs and Network via Max-Flow

Mei, Yixuan; Zhuang, Yonghao; Miao, Xupeng; Yang, Juncheng; Jia, Zhihao; Vinayak, Rashmi (March 2025, Association for Computing Machinery)

Free, publicly-accessible full text available March 30, 2026
FIFO queues are all you need for cache eviction

https://doi.org/10.1145/3600006.3613147

Yang, Juncheng; Zhang, Yazhuo; Qiu, Ziyue; Yue, Yao; Vinayak, Rashmi (October 2023, ACM)
Kangaroo: Caching Billions of Tiny Objects on Flash

https://doi.org/10.1145/3477132.3483568

McAllister, Sara; Berg, Benjamin; Tutuncu-Macias, Julian; Yang, Juncheng; Gunasekar, Sathya; Lu, Jimmy; Berger, Daniel S.; Beckmann, Nathan; Ganger, Gregory R. (October 2021, Symposium on Operating Systems Principles)
null (Ed.)
Full Text Available
PACEMAKER: Avoiding HeART attacks in storage clusters with disk-adaptive redundancy

Kadekodi, Saurabh; Maturana, Francisco; Jayaram Subramanya, Suhas; Yang, Juncheng; Rashmi, K. V.; and Ganger, Gregory (November 2020, Proceedings of the 14th USENIX Symposium on Operating Systems Design and Implementation)
null (Ed.)
Full Text Available
Mithril: mining sporadic associations for cache prefetching

https://doi.org/10.1145/3127479.3131210

Yang, Juncheng; Karimi, Reza; Sæmundsson, Trausti; Wildani, Avani; Vigfusson, Ymir (January 2017, ACM Symposium on Cloud Computing)

The growing pressure on cloud application scalability has accentuated storage performance as a critical bottleneck. Although cache replacement algorithms have been extensively studied, cache prefetching - reducing latency by retrieving items before they are actually requested - remains an underexplored area. Existing approaches to history-based prefetching, in particular, provide too few benefits for real systems for the resources they cost. We propose Mithril, a prefetching layer that efficiently exploits historical patterns in cache request associations. Mithril is inspired by sporadic association rule mining and only relies on the timestamps of requests. Through evaluation of 135 block-storage traces, we show that Mithril is effective, giving an average of a 55% hit ratio increase over LRU and Probability Graph, and a 36% hit ratio gain over Amp at reasonable cost. Finally, we demonstrate the improvement comes from Mithril being able to capture mid-frequency blocks.
more » « less
Full Text Available
Enabling Space Elasticity in Storage Systems

https://doi.org/10.1145/2928275.2928291

Sigurbjarnarson, Helgi; Ragnarsson, Petur O.; Yang, Juncheng; Vigfusson, Ymir; Balakrishnan, Mahesh (June 2016, 9th ACM International Systems and Storage Conference)

Storage systems are designed to never lose data. However, modern applications increasingly use local storage to improve performance by storing soft state such as cached, prefetched or precomputed results. Required is elastic storage, where cloud providers can alter the storage footprint of applications by removing and regenerating soft state based on resource availability and access patterns. We propose a new abstraction called a motif that enables storage elasticity by allowing applications to describe how soft state can be regenerated. Carillon is a system that uses motifs to dynamically change the storage space used by applications. Carillon is implemented as a runtime and a collection of shim layers that interpose between applications and specific storage APIs; we describe shims for a filesystem (Carillon-FS) and a key-value store (Carillon-KV). We show that Carillon-FS allows us to dynamically alter the storage footprint of a VM, while Carillon-KV enables a graph database that accelerates performance based on available storage space
more » « less
Full Text Available

Search for: All records