NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Concurrency-Informed Orchestration for Serverless Functions

Liu, Qichang; Cheng, Yue; Shen, Haiying; Wang, Ao; Balaji, Bharathan (March 2025, The ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2025))

Cold start delays are a main pain point for today’s FaaS (Function-as-a-Service) platforms. A widely used mitigation strategy is keeping recently invoked function containers alive in memory to enable warm starts with minimal overhead. This paper identifies new challenges that state-of-the-art FaaS keep-alive policies neglect. These challenges are caused by concurrent function invocations, a common FaaS workload behavior. First, concurrent requests present a trade-off between reusing busy containers (delayed warm starts) versus cold-starting containers. Second, concurrent requests cause imbalanced evictions of containers that will be reused shortly thereafter. To tackle the challenges, we propose a novel serverless function container orchestration algorithm called CIDRE. CIDRE makes informed decisions to speculatively choose between a delayed warm start and a cold start under concurrency-driven function scaling. CIDRE uses both fine-grained container-level and coarse-grained concurrency information to make balanced eviction decisions. We evaluate CIDRE extensively using two production FaaS workloads. Results show that CIDRE reduces the cold start ratio and the average invocation overhead by up to 75.1% and 39.3% compared to state-of-the-art function keep-alive policies.
more » « less
Full Text Available
FedCaSe: Enhancing Federated Learning with Heterogeneity-aware Caching and Scheduling

https://doi.org/10.1145/3698038.3698559

Khan, Redwan_Ibne Seraj; Paul, Arnab K; Cheng, Yue; Jian, Xun; Butt, Ali R (November 2024, ACM)

Federated learning (FL) has emerged as a new paradigm of machine learning (ML) with the goal of collaborative learning on the vast pool of private data available across distributed edge devices. The focus of most existing works in FL systems has been on addressing the challenges of computation and communication heterogeneity inherent in training with edge devices. However, the crucial impact of I/O and the role of limited on-device storage has not been explored fully in FL context. Without policies to exploit the on-device storage for placement of client data samples, and schedule clients based on I/O benefits, FL training can lead to inefficiencies, such as increased training time and impacted accuracy convergence. In this paper, we propose FedCaSe, a framework for efficiently caching client samples in-situ on limited on-device storage and scheduling client participation. FedCaSe boosts the I/O performance by exploiting a unique characteristic--- the experience, i.e., relative impact on overall performance, of data samples and clients. FedCaSe utilizes this information in adaptive caching policies for sample placement inside the limited memory of edge clients. The framework also exploits the experience information to orchestrate the future selection of clients. Our experiments with representative workloads and policies show that compared to the state of the art, FedCaSe improves the training time by 2.06× for accuracy convergence at the scale of thousands of clients.
more » « less
Full Text Available
ALPS: An Adaptive Learning, Priority OS Scheduler for Serverless Functions

Fu, Yuqi; Shi, Ruizhe; Wang, Haoliang; Chen, Songqing; Cheng, Yue (July 2024, 2024 USENIX Annual Technical Conference (ATC 2024))

FaaS (Function-as-a-Service) workloads feature unique patterns. Serverless functions are ephemeral, highly concurrent, and bursty, with an execution duration ranging from a few milliseconds to a few seconds. The workload behaviors pose new challenges to kernel scheduling. Linux CFS (Completely Fair Scheduler) is workload-oblivious and optimizes long-term fairness via proportional sharing. CFS neglects the short-term demands of CPU time from short-lived serverless functions, severely impacting the performance of short functions. Preemptive shortest job first—shortest remaining process time (SRPT)—prioritizes shorter functions in order to satisfy their short-term demands of CPU time and, therefore, serves as a best-case baseline for optimizing the turnaround time of short functions. A significant downside of approximating SRPT, however, is that longer functions might be starved. In this paper, we propose a novel application-aware kernel scheduler, ALPS (Adaptive Learning, Priority Scheduler), based on two key insights. First, approximating SRPT can largely benefit short functions but may inevitably penalize long functions. Second, CFS provides necessary infrastructure support to implement user-defined priority scheduling. To this end, we design ALPS to have a novel, decoupled scheduler frontend and backend architecture, which unifies approximate SRPT and proportional-share scheduling. ALPS’ frontend sits in the user space and approximates SRPT-inspired priority scheduling by adaptively learning from an SRPT simulation on a recent past workload. ALPS’ backend uses eBPF functions hooked to CFS to carry out the continuously learned policies sent from the frontend to inform scheduling decisions in the kernel. This design adds workload intelligence to workload-oblivious OS scheduling while retaining the desirable properties of OS schedulers. We evaluate ALPS extensively using two production FaaS workloads (Huawei and Azure), and results show that ALPS achieves a reduction of 57.2% in average function execution duration compared to CFS.
more » « less
Full Text Available
A Closer Look into IPFS: Accessibility, Content, and Performance

https://doi.org/10.1145/3656015

Shi, Ruizhe; Cheng, Ruizhi; Han, Bo; Cheng, Yue; Chen, Songqing (May 2024, Proceedings of the ACM on Measurement and Analysis of Computing Systems)

The InterPlanetary File System (IPFS) has recently gained considerable attention. While prior research has focused on understanding its performance characterization and application support, it remains unclear: (1) what kind of files/content are stored in IPFS, (2) who are providing these files, (3) are these files always accessible, and (4) what affects the file access performance. To answer these questions, in this paper, we perform measurement and analysis on over 4 million files associated with CIDs (content IDs) that appeared in publicly available IPFS datasets. Our results reveal the following key findings: (1) Mixed file accessibility: while IPFS is not designed for a permanent storage, accessing a non-trivial portion of files, such as those of NFTs and video streams, often requires multiple retrieval attempts, potentially blocking NFT transactions and negatively affecting the user experience. (2) Dominance of NFT (non-fungible token) and video files: about 50% of stored files are NFT-related, followed by a large portion of video files, among which about half are pirated movies and adult content. (3) Centralization of content providers: a small number of peers (top-50), mostly cloud nodes hosted by tech companies, serve a large portion (95%) of files, deviating from IPFS's intended design goal. (4) High variation of downloading throughput and lookup time: large file retrievals experience lower average throughput due to more overhead for resolving file chunk CIDs, and looking up files hosted by non-cloud nodes takes longer. We hope that our findings can offer valuable insights for (1) IPFS application developers to take into consideration these characteristics when building applications on top of IPFS, and (2) IPFS system developers to improve IPFS and similar systems to be developed for Web3.
more » « less
Full Text Available
Everything You Always Wanted to Know About Storage Compressibility of Pre-Trained ML Models but Were Afraid to Ask

https://doi.org/10.14778/3659437.3659456

Su, Zhaoyuan; Ahmed, Ammar; Wang, Zirui; Anwar, Ali; Cheng, Yue (April 2024, Proceedings of the VLDB Endowment)

As the number of pre-trained machine learning (ML) models is growing exponentially, data reduction tools are not catching up. Existing data reduction techniques are not specifically designed for pre-trained model (PTM) dataset files. This is largely due to a lack of understanding of the patterns and characteristics of these datasets, especially those relevant to data reduction and compressibility. This paper presents the first, exhaustive analysis to date of PTM datasets on storage compressibility. Our analysis spans different types of data reduction and compression techniques, from hash-based data deduplication, data similarity detection, to dictionary-coding compression. Our analysis explores these techniques at three data granularity levels, from model layers, model chunks, to model parameters. We draw new observations that indicate that modern data reduction tools are not effective when handling PTM datasets. There is a pressing need for new compression methods that take into account PTMs' data characteristics for effective storage reduction. Motivated by our findings, we design Elf, a simple yet effective, error-bounded, lossy floating-point compression method. Elf transforms floating-point parameters in such a way that the common exponent field of the transformed parameters can be completely eliminated to save storage space. We develop Elves, a compression framework that integrates Elf along with several other data reduction methods. Elves uses the most effective method to compress PTMs that exhibit different patterns. Evaluation shows that Elves achieves an overall compression ratio of 1.52×, which is 1.31×, 1.32× and 1.29× higher than a general-purpose compressor (zstd), an error-bounded lossy compressor (SZ3), and the uniform model quantization, respectively, with negligible model accuracy loss.
more » « less
Full Text Available
Algorithmic Complexity Attacks on Dynamic Learned Indexes

https://doi.org/10.14778/3636218.3636232

Yang, Rui; Kornaropoulos, Evgenios M; Cheng, Yue (December 2023, Proceedings of the VLDB Endowment)

Learned Index Structures (LIS) view a sorted index as a model that learns the data distribution, takes a data element key as input, and outputs the predicted position of the key. The original LIS can only handle lookup operations with no support for updates, rendering it impractical to use for typical workloads. To address this limitation, recent studies have focused on designing efficient dynamic learned indexes. ALEX, as the first and one of the representative dynamic learned index structures, enables dynamism by incorporating a series of design choices, including adaptive key space partitioning, dynamic model retraining, and sophisticated engineering and policies that prioritize read/write performance. While these design choices offer improved average-case performance, the emphasis on flexibility and performance increases the attack surface by allowing adversarial behaviors that maximize ALEX's memory space and time complexity in worst-case scenarios. In this work, we present the first systematic investigation of algorithmic complexity attacks (ACAs) targeting the worst-case scenarios of ALEX. We introduce new ACAs that fall into two categories, space ACAs and time ACAs, which target the memory space and time complexity, respectively. First, our space ACA on data nodes exploits ALEX's gapped array layout and uses Multiple-Choice Knapsack (MCK) to generate an optimal adversarial insertion plan for maximizing the memory consumption at the data node level. Second, our space ACA on internal nodes exploits ALEX's catastrophic cost mitigation mechanism, causing an out-of-memory (OOM) error with only a few hundred adversarial insertions. Third, our time ACA generates pathological insertions to increase the disparity between the actual key distribution and the linear models of data nodes, deteriorating the runtime performance by up to 1, 641× compared to ALEX operating under legitimate workloads.
more » « less
Full Text Available
λFS: A Scalable and Elastic Distributed File System Metadata Service using Serverless Functions

https://doi.org/10.1145/3623278.3624765

Carver, Benjamin; Han, Runzhou; Zhang, Jingyuan; Zheng, Mai; Cheng, Yue (March 2023, ACM)

Full Text Available
InfiniStore: Elastic Serverless Cloud Storage

https://doi.org/10.14778/3587136.3587139

Zhang, Jingyuan; Wang, Ao; Ma, Xiaolong; Carver, Benjamin; Newman, Nicholas John; Anwar, Ali; Rupprecht, Lukas; Tarasov, Vasily; Skourtis, Dimitrios; Yan, Feng; et al (March 2023, Proceedings of the VLDB Endowment)

Cloud object storage such as AWS S3 is cost-effective and highly elastic but relatively slow, while high-performance cloud storage such as AWS ElastiCache is expensive and provides limited elasticity. We present a new cloud storage service called ServerlessMemory, which stores data using the memory of serverless functions. ServerlessMemory employs a sliding-window-based memory management strategy inspired by the garbage collection mechanisms used in the programming language to effectively segregate hot/cold data and provides fine-grained elasticity, good performance, and a pay-per-access cost model with extremely low cost. We then design and implement InfiniStore, a persistent and elastic cloud storage system, which seamlessly couples the function-based ServerlessMemory layer with a persistent, inexpensive cloud object store layer. InfiniStore enables durability despite function failures using a fast parallel recovery scheme built on the auto-scaling functionality of a FaaS (Function-as-a-Service) platform. We evaluate InfiniStore extensively using both microbenchmarking and two real-world applications. Results show that InfiniStore has more performance benefits for objects larger than 10 MB compared to AWS ElastiCache and Anna, and InfiniStore achieves 26.25% and 97.24% tenant-side cost reduction compared to InfiniCache and ElastiCache, respectively.
more » « less
Full Text Available
SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training

Khan, R.; Yazdani, A.; Fu, Y.; Paul, A.; Ji, B.; Jian, X.; Cheng, Y.; and Butt, A. (February 2023, 21th USENIX Conference on File and Storage Technologies)

Full Text Available

Search for: All records