Search for: All records

Creators/Authors contains: "Cheng, Yue."

« Prev Next »

Total Resources

29

Resource Type
Conference Paper

23

Conference Proceeding

1

Dataset

0

Journal Article

5

Workshop Report

0

Availability
Full Text / Resource Available

25

Citation Only

4

Save Results
Excel (limit 2000)
CSV (limit 5000)
XML (limit 5000)

Have feedback or suggestions for a way to improve these results?
!

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

𝜆FS: A Scalable and Elastic Distributed File System Metadata Service using Serverless Functions

Carver, Benjamin ; Han, Runzhou ; Zhang, Jingyuan ; Zheng, Mai ; Cheng, Yue ( April 2024 , ACM ASPLOS 2023)

The metadata service (MDS) sits on the critical path for distributed file system (DFS) operations, and therefore it is key to the overall performance of a large-scale DFS. Common “serverful” MDS architectures, such as a single server or cluster of servers, have a significant shortcoming: either they are not scalable, or they make it difficult to achieve an optimal balance of performance, resource utilization, and cost. A modern MDS requires a novel architecture that addresses this shortcoming. To this end, we design and implement 𝜆FS, an elastic, high- performance metadata service for large-scale DFSes. 𝜆FS scales a DFS metadata cache elastically on a FaaS (Function-as-a-Service) platform and synthesizes a series of techniques to overcome the obstacles that are encountered when building large, stateful, and performance-sensitive applications on FaaS platforms. 𝜆FS takes full advantage of the unique benefits offered by FaaS—elastic scaling and massive parallelism—to realize a highly-optimized metadata service capable of sustaining up to 4.13× higher throughput, 90.40% lower latency, 85.99% lower cost, 3.33× better performance-per-cost, and better resource utilization and efficiency than a state-of-the-art DFS for an industrial workload
more » « less
Free, publicly-accessible full text available April 27, 2025
SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training

Khan, Redwan Ibne ; Yazdani, Ahmad H. ; Fu, Yuqi ; Paul, Arnab K. ; Ji, Bo ; Jian, Xun ; Cheng, Yue ; Butt, Ali R. ( February 2024 , In Proceedings of the 21st USENIX Conference on File and Storage Technologies (FAST))

Free, publicly-accessible full text available February 28, 2025
Towards cost-effective and resource-aware aggregation at Edge for Federated Learning

https://doi.org/10.1109/BigData59044.2023.10386691

Khan, Ahmad Faraz ; Li, Yuze ; Wang, Xinran ; Haroon, Sabaat ; Ali, Haider ; Cheng, Yue ; Butt, Ali R. ; Anwar, Ali ( December 2023 , Proceedings of IEEE International Conference on Big Data (BigData))
SION: Elastic Serverless Cloud Storage

Zhang, Jingyuan ; Wang, Ao ; Ma, Xiaolong ; Carver, Benjamin ; Newman, Nicholas ; Anwar, Ali ; Rupprecht, Lukas ; Skourtis, Dimitrios ; Tarasov, Vasily ; Yan, Feng ; et al ( August 2023 , International Conference on Very Large Data Bases (VLDB 2023))

Free, publicly-accessible full text available August 1, 2024
Toward Quantized Model Parallelism for Graph-Augmented MLPs Based on Gradient-Free ADMM Framework

https://doi.org/10.1109/TNNLS.2022.3223879

Wang, Junxiang ; Li, Hongyi ; Chai, Zheng ; Wang, Yongchao ; Cheng, Yue ; Zhao, Liang ( May 2023 , IEEE Transactions on Neural Networks and Learning Systems)

Free, publicly-accessible full text available May 16, 2024
SHADE: Enable Fundamental Cacheability for Distributed Deep Learning Training

Khan Redwan Ibne Seraj ; Yazdani, Ahmad H. ; Fu, Yuqi ; Paul, Arnab K. ; Ji, Bo ; Jian, Xun ; Cheng, Yue ; Butt, Ali R. ( February 2023 , 21st USENIX Conference on File and Storage Technologies (FAST))

Full Text Available
SFS: smart OS Scheduling for Serverless Functions

Fu, Yuqi ; Liu, Li ; Wang, Haoliang ; Cheng, Yue ; Chen, Songqing ( November 2022 , International Conference for High Performance Computing Networking Storage and Analysis)

Serverless computing enables a new way of building and scaling cloud applications by allowing developers to write fine-grained serverless or cloud functions. The execution duration of a cloud function is typically short---ranging from a few milliseconds to hundreds of seconds. However, due to resource contentions caused by public clouds' deep consolidation, the function execution duration may get significantly prolonged and fail to accurately account for the function's true resource usage. We observe that the function duration can be highly unpredictable with huge amplification of more than 50× for an open-source FaaS platform (OpenLambda). Our experiments show that the OS scheduling policy of cloud functions' host server can have a crucial impact on performance. The default Linux scheduler, CFS (Completely Fair Scheduler), being oblivious to workloads, frequently context-switches short functions, causing a turnaround time that is much longer than their service time. We propose SFS (Smart Function Scheduler), which works entirely in the user space and carefully orchestrates existing Linux FIFO and CFS schedulers to approximate Shortest Remaining Time First (SRTF). SFS uses two-level scheduling that seamlessly combines a new FILTER policy with Linux CFS, to trade off increased duration of long functions for significant performance improvement for short functions. We implement SFS in the Linux user space and port it to OpenLambda. Evaluation results show that SFS significantly improves short functions' duration with a small impact on relatively longer functions, compared to CFS.
more » « less
Full Text Available
InfiniStore: Elastic Serverless Cloud Storage

https://doi.org/10.14778/3587136.3587139

Zhang, Jingyuan ; Wang, Ao ; Ma, Xiaolong ; Carver, Benjamin ; Newman, Nicholas John ; Anwar, Ali ; Rupprecht, Lukas ; Tarasov, Vasily ; Skourtis, Dimitrios ; Yan, Feng ; et al ( March 2023 , Proceedings of the VLDB Endowment)

Cloud object storage such as AWS S3 is cost-effective and highly elastic but relatively slow, while high-performance cloud storage such as AWS ElastiCache is expensive and provides limited elasticity. We present a new cloud storage service called ServerlessMemory, which stores data using the memory of serverless functions. ServerlessMemory employs a sliding-window-based memory management strategy inspired by the garbage collection mechanisms used in the programming language to effectively segregate hot/cold data and provides fine-grained elasticity, good performance, and a pay-per-access cost model with extremely low cost. We then design and implement InfiniStore, a persistent and elastic cloud storage system, which seamlessly couples the function-based ServerlessMemory layer with a persistent, inexpensive cloud object store layer. InfiniStore enables durability despite function failures using a fast parallel recovery scheme built on the auto-scaling functionality of a FaaS (Function-as-a-Service) platform. We evaluate InfiniStore extensively using both microbenchmarking and two real-world applications. Results show that InfiniStore has more performance benefits for objects larger than 10 MB compared to AWS ElastiCache and Anna, and InfiniStore achieves 26.25% and 97.24% tenant-side cost reduction compared to InfiniCache and ElastiCache, respectively.
more » « less
Full Text Available
Understanding Impact of Lossy Compression on Derivative-related Metrics in Scientific Datasets

https://doi.org/10.1109/DRBSD56682.2022.00011

Su, Zhaoyuan ; Di, Sheng ; Gok, Ali Murat ; Cheng, Yue ; Cappello, Franck ( November 2022 , IEEE/ACM 8th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD))

Full Text Available
FedAT: A High-Performance and Communication-Efficient Federated Learning System with Asynchronous Tiers

https://doi.org/10.1145/3458817.3476211

Chai, Zheng ; Chen, Yujing ; Anwar, Ali ; Zhao, Liang ; Cheng, Yue ; Rangwala, Huzefa ( November 2021 , The International Conference for High Performance Computing, Networking, Storage, and Analysis)

Federated learning (FL) involves training a model over massive distributed devices, while keeping the training data localized and private. This form of collaborative learning exposes new tradeoffs among model convergence speed, model accuracy, balance across clients, and communication cost, with new challenges including: (1) straggler problem—where clients lag due to data or (computing and network) resource heterogeneity, and (2) communication bottleneck—where a large number of clients communicate their local updates to a central server and bottleneck the server. Many existing FL methods focus on optimizing along only one single dimension of the tradeoff space. Existing solutions use asynchronous model updating or tiering-based, synchronous mechanisms to tackle the straggler problem. However, asynchronous methods can easily create a communication bottleneck, while tiering may introduce biases that favor faster tiers with shorter response latencies. To address these issues, we present FedAT, a novel Federated learning system with Asynchronous Tiers under Non-i.i.d. training data. FedAT synergistically combines synchronous, intra-tier training and asynchronous, cross-tier training. By bridging the synchronous and asynchronous training through tiering, FedAT minimizes the straggler effect with improved convergence speed and test accuracy. FedAT uses a straggler-aware, weighted aggregation heuristic to steer and balance the training across clients for further accuracy improvement. FedAT compresses uplink and downlink communications using an efficient, polyline-encoding-based compression algorithm, which minimizes the communication cost. Results show that FedAT improves the prediction performance by up to 21.09% and reduces the communication cost by up to 8.5×, compared to state-of-the-art FL methods.
more » « less
Full Text Available

« Prev Next »